<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Generalizable Training Techniques for Fine-Grained Long-Tailed Image Recognition: Transferring Methods Optimized for FungiCLEF 2024 to SnakeCLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Jack</forename><forename type="middle">N</forename><surname>Etheredge</surname></persName>
							<email>jack.etheredge@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Twosense</orgName>
								<address>
									<settlement>New York</settlement>
									<region>New York</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Generalizable Training Techniques for Fine-Grained Long-Tailed Image Recognition: Transferring Methods Optimized for FungiCLEF 2024 to SnakeCLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">22DED9A8AF5FD3481E59A4C23F005189</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:51+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Fine-grained classification</term>
					<term>Long-tailed</term>
					<term>Metaformer</term>
					<term>CAFormer</term>
					<term>SnakeCLEF</term>
					<term>FungiCLEF</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Accurate identification of species in fine-grained, long-tailed datasets poses significant challenges due to imbalanced class distributions and the necessity for precise classification while minimizing confusion between dangerous and harmless species. This paper introduces a generalized training and inference methodology designed to tackle these challenges, demonstrated through competitive performance in both the SnakeCLEF and FungiCLEF 2024 challenges. While results for FungiCLEF 2024 are detailed in an accompanying paper, this work primarily explores the application and performance of the same techniques to the SnakeCLEF 2024 challenge. The proposed approach integrates a combination of augmentation techniques, specialized loss functions, and robust model architectures to enhance classification accuracy while jointly minimizing the asymmetric penalty for misclassification of venomous species. For both the public and private leaderboards, my approach achieved second place in all metrics. On the public leaderboard, it scored 81.2 for Track 1, 945 for Track 2, and 33.35 for the F1 score. On the private leaderboard, it scored 79.58 for Track 1, 2557 for Track 2, and 30.29 for the F1 score. These experimental results validate the effectiveness of this methodology, showcasing its robustness across diverse datasets and evaluation metrics. The versatility of this approach indicates its potential applicability to a wide range of similar image recognition tasks. Code and implementation details are available at https://github.com/Jack-Etheredge/snakeclef2024.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Venomous snake bites cause over half a million deaths and disabilities annually, highlighting the need for an effective image-based snake identification system <ref type="bibr" target="#b0">[1]</ref>. Such a system could enhance global health efforts, improve ecological and epidemiological data, and optimize antivenom distribution <ref type="bibr" target="#b1">[2]</ref>. To this end, the SnakeCLEF 2024 challenge <ref type="bibr" target="#b2">[3]</ref> is organized with metrics for both general misclassification rate as well as distinct penalties for the confusion of venomous snakes with other venomous snake species and the confusion of venomous snakes with harmless snakes.</p><p>Fine-grained long-tailed image recognition is a challenging task due to the need for high granularity in distinguishing between visually similar classes compounded by significant class imbalance. Competitions like SnakeCLEF and FungiCLEF, both part of the LifeCLEF 2024 <ref type="bibr" target="#b3">[4]</ref> lab 1 , provide platforms for developing and benchmarking methodologies to tackle these issues. SnakeCLEF 2024 focuses on snake species classification, while FungiCLEF 2024 <ref type="bibr" target="#b4">[5]</ref> targets fungi species, including the identification of unknown species and minimizing misclassification between edible and poisonous varieties. Despite differences in datasets and evaluation metrics, both competitions share challenges inherent to fine-grained long-tailed classification, making them ideal for testing the generalizability of my proposed method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Many different techniques have been explored for the classification of fine-grained images of snakes <ref type="bibr" target="#b5">[6]</ref> and fungi <ref type="bibr" target="#b6">[7]</ref>. Recent work for both tasks have shown the important role that the inclusion of metadata can play in the final classification performance of different techniques <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>. This year, however, metadata was excluded from the test set for SnakeCLEF. One effect of this is that the geographic regions that the snakes belong to cannot be directly utilized by the models nor can challengers focus efforts on training the classes that belong to the geographic regions present in the test set. Various loss functions and architectures have been successfully applied to SnakeCLEF to deal with the long-tailed fine-grained nature of the data. Seesaw loss <ref type="bibr" target="#b12">[13]</ref> and real-world weighted cross-entropy <ref type="bibr" target="#b13">[14]</ref> were used by <ref type="bibr" target="#b9">[10]</ref>. Focal loss <ref type="bibr" target="#b14">[15]</ref> and ArcFace loss <ref type="bibr" target="#b15">[16]</ref> were both utilized by <ref type="bibr" target="#b11">[12]</ref>. Interestingly, this solution also utilized a training dataset preprocessing step of cropping the images to the region of interest containing the snake. ArcFace and SimCLR <ref type="bibr" target="#b16">[17]</ref> were used by <ref type="bibr" target="#b8">[9]</ref>. ConvNeXt <ref type="bibr" target="#b17">[18]</ref> and Metaformer <ref type="bibr" target="#b18">[19]</ref> were top performing model architectures in last year's challenge <ref type="bibr" target="#b5">[6]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset</head><p>The SnakeCLEF dataset consists of 182,261 images across 1,784 snake species. The training data includes geographical location metadata. FungiCLEF's dataset comprises 295,938 training images of 1,604 species with extensive metadata, including habitat and location. While the metadata was present in the test data for FungiCLEF 2024, it was absent from the test data for SnakeCLEF 2024.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Competition Objectives and Metrics</head><p>Both competitions aim to enhance species recognition accuracy, albeit with differing focuses. SnakeCLEF 2024 evaluates class-balanced metrics, emphasizing the importance of correctly classifying venomous vs. non-venomous species without leveraging metadata at inference time to blind the models to the geographic region. FungiCLEF 2024 includes an open-set component for identifying unknown species and penalizes misclassifications between edible and poisonous fungi. The venomous confusion loss for SnakeCLEF is more complex than the poisonous confusion loss for FungiCLEF, with different costs for misclassification between two venomous classes <ref type="bibr" target="#b1">(2)</ref>, between two nonvenomous classes (1), venomous → nonvenomous confusion <ref type="bibr" target="#b4">(5)</ref>, and nonvenomous → venomous confusion <ref type="bibr" target="#b1">(2)</ref>. Both competitions report the macro-F1 score, but SnakeCLEF additionally incorporates it into the Track 1 score. Track 1 is a weighted average of the accuracies for the four different confusion categories and the macro-averaged F1. Accuracy is also reported for both competitions, but is largely ignored for the results shown in this paper, as it is not reported for the granular results in the overview of either competition last year <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Training Techniques</head><p>To address the long-tailed distribution and fine-grained nature of the datasets, I employed a combination of training techniques and test-time augmentations detailed below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Data Augmentation</head><p>Training was performed with a resize to 768 with bicubic interpolation, square random crop of size 384, TrivialAugment <ref type="bibr" target="#b19">[20]</ref>, horizontal flip with 50% probability, and random erasing <ref type="bibr" target="#b20">[21]</ref> with a probability of 25%, applied in that order. MixUp augmentation <ref type="bibr" target="#b21">[22]</ref> and augmentations inspired by it were intentionally excluded due to the fine-grained nature of the dataset, which represents higher intra-class variability and lower inter-class variability than standard classification tasks. However, mixing augmentations in the form of CutMix <ref type="bibr" target="#b22">[23]</ref> and RandoMix <ref type="bibr" target="#b23">[24]</ref> were previously employed successfully by <ref type="bibr" target="#b9">[10]</ref>. Future work could explore the use of different data augmentations including MixUp and similar techniques during training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2.">Loss Functions</head><p>Multiple loss functions were evaluated for the classification loss. Seesaw loss <ref type="bibr" target="#b12">[13]</ref> and a custom venom loss were used to train the models in the final ensemble. Seesaw loss was chosen since it is designed for long-tailed classification. Further, it achieves this without the need for class rebalancing through data sampling by adding additional terms to the standard cross-entropy loss. It employs a mitigation factor to reduce penalties for tail categories based on the ratio of training instances as well as a compensation factor to increase penalties for misclassified instances, thereby reducing the otherwise overwhelming effect of false positives in the tail classes.</p><p>A custom venom loss was added to seesaw loss to create the total loss during training. This cost function was formulated by creating a pairwise cost for the confusion for every combination of the target and predicted class. The vector corresponding to the target class was indexed from this cost matrix and the softmax probabilities were multiplied elementwise with the cost vector. The sum of these costs was used as the venom loss. This loss is similar to the real-world weighted cross entropy loss <ref type="bibr" target="#b13">[14]</ref>, but uses the costs directly instead of utilizing a weighted log loss. Future work could investigate the relative performance of these two loss functions. Since the venom confusion metric is calculated based on the percentage of misclassifications, it is a class-balanced metric. As such, I also experimented with the application of an inverse class weight to the venom loss to account for class imbalance (results shown in Table <ref type="table" target="#tab_4">5</ref>).</p><p>Balanced sampling is a simpler alternative to seesaw loss for mitigating the effect of class imbalance. For each epoch, samples were drawn with replacement from the training data with a probability inversely proportional to the number of samples belonging to that class. Focal loss <ref type="bibr" target="#b14">[15]</ref> penalizes misclassifications for difficult to classify samples by reducing the loss for well-classified examples (high predicted probability for the correct class) relative to standard cross entropy loss. This is done in an attempt to put more focus on difficult examples dynamically during training. Since the tail classes will likely be more difficult to classify, focal loss should in theory work in conjunction with balanced sampling to improve tail class classification.</p><p>Another loss that was evaluated was sub-center ArcFace loss <ref type="bibr" target="#b24">[25]</ref>. Sub-center ArcFace loss is a refinement to ArcFace that allows multiple cluster centers per class, which seemed better suited to snake classification than the original ArcFace loss since snakes of the same species can vary widely in their appearance due to age and other factors. Losses that operate directly on the embedding of the model rather than a dense classification are typically used in conjunction with clustering or a distance-based classification relative to ground truth embeddings per class. Instead, I tested the addition of sub-center ArcFace loss to the seesaw and custom venom losses.</p><p>LogitNorm <ref type="bibr" target="#b25">[26]</ref> was applied to the logits during training before seesaw loss or venom loss were applied. This was done for parity with the models used for the FungiCLEF 2024 challenge <ref type="bibr" target="#b26">[27]</ref>. LogitNorm increases class separation in the embedding space of the classifier as well as calibrating the model probabilities. Since the class with the highest predicted probability was selected as the classification in every case, probability calibration was assumed to be of no consequence for individual model classifications. However, since the probabilities are averaged in the model ensembles, it is possible that probability calibration could have an impact on the performance of ensembles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.3.">Optimization and Training Details</head><p>The training paradigm used for the SnakeCLEF competition involved several key techniques and methodologies. The dataset was augmented using Trivial Augment and Random Erasing to improve the models' robustness. The AdamW optimizer <ref type="bibr" target="#b27">[28]</ref> was used with a weight decay of 0.05. The learning rate was initially set to 1e-3 for the classification output dense layer with the pretrained model frozen for the first 5 epochs, then reduced to 5e-5. Training was conducted with a batch size of 40 for CAFormer-S18, 32 for Metaformer-0, and 24 for CAFormer-S36. CAFormer models were were used with weights pretrained on ImageNet-21K <ref type="bibr" target="#b28">[29]</ref> while Metaformer-0 was used with weights pretrained on iNaturalist2021 <ref type="bibr" target="#b29">[30]</ref>. A dropout rate of 0.2 was implemented between the dense output classification layer and the penultimate layer to prevent overfitting in all cases unless otherwise stated. Learning rate scheduling was employed, reducing the rate by a factor of 0.1 if the model did not improve the validation loss for 5 consecutive epochs. Early stopping was implemented to prevent overfitting and conserve computational resources. The models were fine-tuned using CAFormer-S18, and a 4x ensemble approach was adopted, utilizing different data splits to improve generalization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Inference Techniques</head><p>During inference, several techniques were applied to maximize performance. Test-time augmentations were used, including horizontal flips and multi-instance averaging, to increase the robustness of predictions. The resolution of CAFormer-S18 and CAFormer-S36 models was adjusted by resizing from 384x384 to 576x576 for higher resolution inference. Additionally, ensemble averaging was employed, combining predictions from multiple models to improve overall accuracy. Models were ensembled by simple averaging of the prediction probabilities before selecting the class with the highest predicted probability as the prediction. These strategies collectively enhanced the model's performance during the inference phase as shown in Section 4.</p><p>Multi-crop refers to the generation and use of three overlapping crops that collectively ensure complete coverage of the entire image. The predicted class probabilities for each of these crops are then averaged to generate the final maximum probability classification. Horizontal flipping (hflip) augmentation involves taking the average predicted class probabilities from both the original and horizontally flipped version of each image in the same manner as multi-crop. Multi-instance refers to averaging the probabilities for each instance in cases where an observation has more than a single instance. In cases where multi-instance is not used, only the first instance for each observation is used to make each prediction. Image size refers to the inference image size that the image was resized to before a square center crop (or multiple square crops in the case of multi-crop) of the same resolution are taken.</p><p>Taken collectively, if multi-instance and hflip test time augmentations are both used with an ensemble of CAFormer models, the inference procedure would be as follows: Every image (instance) belonging to each observation would be 1) resized and center cropped to 576x576, and 2) horizontally flipped to keep both the original and mirrored image. Then, probabilities would be generated for each model for both flips of every instance. Finally, the simple average of all of these probabilities would be calculated to determine the class prediction based on the maximum class probability after averaging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Model Architectures</head><p>An ensemble of CAFormer models <ref type="bibr" target="#b18">[19]</ref> were used in the best-performing solution for this competition. These models balance computational efficiency and classification accuracy, making them suitable for both competitions. Notably, the CAFormer models performed consistently well across diverse datasets. A dropout rate of 0.2 was used between the dense output classification layer and the penultimate layer of the network.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">Ensemble of Data Splits</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7.">Computational Resources</head><p>All experiments were conducted on a single NVIDIA RTX 4090 graphics card, emphasizing the efficiency of our methodology given limited computational resources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>My methodology demonstrated competitive performance in both SnakeCLEF and FungiCLEF 2024.</p><p>For SnakeCLEF, my model achieved second place in all competition metrics on the public and private leaderboards, successfully differentiating between venomous and non-venomous species without the ability to overfit to geographic regions utilizing the metadata. For FungiCLEF, my approach excelled in recognizing unknown species while minimizing edible-poisonous misclassifications. My models achieved 1st place for Track1 classification score, macro F1, and Accuracy, while achieving competitive performance in the other two metrics <ref type="bibr" target="#b26">[27]</ref>. FixRes involves not only inference at a higher resolution relative to training, but also fine-tuning the final layers of the model at the desired inference resolution without training augmentations. FixRes fine-tuning did not improve performance on any metric when inference was performed on a resolution of 576, as can be seen in Table <ref type="table" target="#tab_0">1</ref>.</p><p>Multiple loss functions were evaluated in addition to seesaw loss and the custom venom loss. One of the losses that was evaluated in addition to seesaw loss was sub-center ArcFace loss. Table <ref type="table" target="#tab_1">2</ref> shows the addition of sub-center ArcFace loss to the training of one of the models in the ensemble. CAFormer-S18 with data split A was trained both with and without the sub-center ArcFace loss. In both cases, the classifications from the dense layer were used for predictions rather than utilizing the embeddings directly. The ensemble that contained a model with sub-center ArcFace loss had poorer performance across all metrics. This suggests that the addition of sub-center ArcFace loss to the seesaw and custom venom losses did not further mitigate the impact of the tail classes with very few observations despite the loss optimizing the separation of classes in the pre-classification model embedding.</p><p>Another loss that was evaluated for the multiclass classification was focal loss, which was paired with balanced sampling to directly address class imbalance. Focal loss with balanced sampling did not perform as well as seesaw loss, as can be seen in Table <ref type="table" target="#tab_2">3</ref>. Increasing the dropout rate for the penultimate layer may be slightly beneficial, with the greatest percent improvement in metrics being the F1 score (which increased 0.99), but this difference is trivial compared to the difference across all metrics for seesaw loss vs focal loss with balanced sampling. F1 increases nearly 7 points when focal loss with balanced sampling is replaced with seesaw loss.</p><p>Initial experiments with Metaformer-0 <ref type="bibr" target="#b18">[19]</ref> showed that CAFormer-S18 gave better performance across all metrics. While Metaformer shows remarkable performance on fine-grained datasets, partic-Table <ref type="table">6</ref> Ensemble performance. CAFormer-S36 is also included in the comparison as a strong single model baseline. * indicates that the model was trained with random erasing. All models were trained with seesaw loss and venom loss and used horizontal flipping, multi-instance averaging, and an image size of 576 at inference. In cases where multiple data splits are denoted, this refers to an ensemble of multiple models, one per data split (e.g. CAFormer-S18 (A*, C*) refers to an ensemble of two CAFormer-S18 models, one trained on data split A and one trained on data split C, both with random erasing). ularly when metadata is available, it appears that CAFormer models may be more performant when metadata is unavailable. Class weighted venom loss was evaluated as an alternative to the venom loss, since the custom venom loss did not account for class imbalance. In the case of two different data splits, the addition of this weight term to the venom loss negatively impacted all metrics. Results are shown in Table <ref type="table" target="#tab_4">5</ref>. Weighted venom loss did not improve the generalizability of the Track 2 score. Several different model ensembles were evaluated based on different splits of the dataset, the inclusion of random erasing, and the CAFormer-S18 vs CAFormer-S36 architecture. Results are shown in Table <ref type="table">6</ref>. The best performing ensemble of models included on the private leaderboard comprised a CAFormer-S36 model trained on split D without random erasing, and three CAFormer-S18 models trained on splits B and C with random erasing and on split D without random erasing. The private leaderboard Track1 score of 79.96, Track 2 score of 2481, and F1 score of 30.2 achieved second place for all three metrics. An ensemble of all CAFormer-S18 models slightly outperformed this ensemble for private leaderboard F1 score (30.29 vs 30.2). Interestingly, this ensemble comprisingly solely CAFormer-S18 models performed best across all public leaderboard metrics. The all CAFormer-S18 ensemble has the same composition as the ensemble mentioned above with the exception of replacing the CAFormer-S36 model with a CAFormer-S18 model trained on data split A without random erasing. In all cases, there was a large disparity between the public and private leaderboard performance, particularly with respect to the Track 2 venomous → harmless confusion loss. In all cases, the private Track 2 loss was over two fold higher than the public Track 2 loss.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Models</head><p>Several test-time augmentations were evaluated including horizontal flipping (hflip), averaging multiple crops (multi-crop), multi-instance averaging, and inference at a higher resolution than the training resolution. The results for all of these augmentations are summarized in Table <ref type="table" target="#tab_7">7</ref> with the exception of increasing the inference image resolution, the results of which are shown in Table <ref type="table">8</ref>. Each of the averaging-based test-time augmentations improve performance, in isolation or in combination. Multi-instance is the most computationally demanding, but also provides the greatest lift in performance of hflip, multi-crop, and multi-instance. Hflip provides a similar lift to multi-crop, but involves doubling rather than tripling the number of images that must pass through the models. The most impactful augmentation for is inference at 576 image resolution instead of inference at the training resolution of 384, as shown in Table <ref type="table">8</ref>.</p><p>Random erasing was included in many of the models in an effort to increase the generalizability of the models. It appears that too much of the class-specific information was obscured by the erasure leading to a slight degradation in performance. As shown in Table <ref type="table" target="#tab_9">9</ref>, the public Track 1 score is worse by 0.87 while the private Track 1 score is worse by 2.31 when random erasing is included in the training augmentations.</p><p>Since the learning rate reduction and early stopping was decided based on validation loss, it was necessary to perform all training with a training-validation split. In order to utilize all the available data and to increase the diversity of the models in the final ensemble, different training-validation splits of the data were used to train otherwise identical models. Table <ref type="table" target="#tab_10">10</ref> shows the impact of these different splits on the performance of the models. The difference between the best and worst performing splits is greater than the differences between the inclusion of LogitNorm (Table <ref type="table" target="#tab_12">12</ref>), random erasing (Table <ref type="table" target="#tab_9">9</ref>), horizontal flipping (Table <ref type="table" target="#tab_7">7</ref>), or multiple crops (Table <ref type="table" target="#tab_7">7</ref>). The difference was also greater than the difference between a larger ensemble and averaging multiple image resolutions (Table <ref type="table" target="#tab_11">11</ref>). This suggests that different splits of the data can have a significant impact on final performance of the models, particularly if individual models are used instead of being combined into an ensemble.</p><p>Averaging the predicted probabilities from multiple image (multi-res) resolutions was investigated as a test-time augmentation. However, since this requires performing inference through the same model for n resolutions, the increased performance must be weighted against this increase in compute cost. Since multi-res requires inference through the same model n resolutions number of times, the compute cost should be comparable between an ensemble that is twice as large vs averaging two resolutions.</p><p>Better performance is achieved across all metrics using a larger ensemble, as shown in Table <ref type="table" target="#tab_11">11</ref>.</p><p>All final models were trained with LogitNorm. To determine whether its inclusion was beneficial, an identical model was trained without LogitNorm and evaluated using the same settings. The inclusion of LogitNorm may slightly degrade performance of the models as shown in Table <ref type="table" target="#tab_12">12</ref>. Since it significantly improves performance on FungiCLEF <ref type="bibr" target="#b26">[27]</ref>, it may be of greater benefit to open-set classification, and thus if the task will never be open-set, it seems that LogitNorm can be safely excluded from the training. All final models were trained with venom loss. To determine whether its inclusion was beneficial, an identical model was trained without venom loss and evaluated using the same settings. As shown in Table <ref type="table" target="#tab_13">13</ref>, the custom venom loss improves performance on the Track2 metric as expected. Since the Track1 metric is also influenced by the venomous → harmless confusion, it is unsurprising that Track1 would improve with the inclusion of venom loss. What is more surprising is that the F1 score was improved by the venom loss. This shows that a real-world cost matrix for pairwise class confusion can be utilized without sacrificing overall classification performance. Future work could investigate how broadly applicable this is beyond this specific dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Final model ensemble and leaderboard performance</head><p>The best performing ensembles both utilized horizontal flipping, multi-instance averaging, and a higher resolution image of 576x576 relative to the training resolution of 384x384. An ensemble of CAFormer-S18 models trained on data splits A and D without random erasing and data splits B and C with random erasing performed best for all public leaderboard metrics as well as F1 on the private leaderboard. However, this ensemble was outperformed for Track 1 and Track 2 on the private leaderboard by swapping the CAFormer-S18 model trained on data split A for a CAFormer-S36 model trained on data split D as shown in Table <ref type="table">6</ref> and described previously in Section 4.</p><p>Table <ref type="table" target="#tab_14">14</ref> shows the public leaderboard performance of each team and Table <ref type="table" target="#tab_4">15</ref> shows the private leaderboard performance. In both cases, my method achieves 2nd place across all metrics. Notably, there is a larger gap between the performance of my models and 3rd place than the difference in performance of my models relative to 1st place for all metrics. Interestingly, there is a large disparity in the performance of Track2 between the public leaderboard and the private leaderboard for all participants. This suggests that either the public and private leaderboard have different data distributions or all competitors overfit their solutions to the public leaderboard. Since the other metrics do not show such a large disparity, this suggests that the ratio of difficult to classify venomous species may be greater in the private leaderboard test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>I presented in this work a robust training and inference methodology that generalizes well across different fine-grained long-tailed image recognition tasks. The similarities between SnakeCLEF and FungiCLEF, such as asymmetric penalties for misclassification, highlight the effectiveness and generalizability of my approach. Differences, such as the lack of metadata in SnakeCLEF and the presence of unknowns in FungiCLEF, necessitated specific adjustments. Future work could explore few-shot learning techniques to further enhance performance for classes with few examples. Additional future work could investigate the potential for geographic metadata to increase model bias against the successful identification of invasive snake species in comparison to models not using that metadata. My approach's competitive performance on both SnakeCLEF and FungiCLEF 2024 suggests its potential applicability to other similar challenges.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Models were trained on four different training-validation data splits to increase diversity and decrease correlation between the errors of the models comprising the ensemble. The dataset is originally provided as three collections of observations: training, validation, and additional training observations for rare classes. In all cases, the additional training observations are considered part of the original training set. As such, the original dataset can be considered as being provided as a single training and validation split. The A, B, and C data splits were constructed by first combining the original training and validation data for the competition. The A split used the first 90% of the observations per class as the training</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>samples and the remaining 10% of the observations per class as the validation samples. In cases where there were fewer than 4 observations per class, all observations were used for training. The B split used the last 90% of samples for training and the C split used the middle 90% of samples for training, both with the same exception regarding tail classes with very few observations. Since the original training observations come before the original validation observations in this combined dataset, the A split is most similar of these 3 splits to the original training and validation split. The D split is the original training and validation split provided by the competition.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>FixRes fine-tuning. All models were trained with seesaw loss and venom loss and all inference was performed using horizontal flipping, multi-instance averaging, image size 576.</figDesc><table><row><cell cols="2">Models (data split) FixRes</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell>Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell>CAFormer-S18 (A)</cell><cell>-</cell><cell>79.41</cell><cell>1052</cell><cell>29.74</cell><cell>78.32</cell><cell>2729</cell><cell>26.18</cell></row><row><cell>CAFormer-S18 (A)</cell><cell></cell><cell>78.08</cell><cell>1140</cell><cell>28.11</cell><cell>76.01</cell><cell>3155</cell><cell>24.24</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Inclusion of sub-center ArcFace loss. All models were trained with seesaw loss and venom loss and all inference was performed using horizontal flipping, multi-instance averaging, image size 576. * indicates that the model was trained with random erasing. CAFormer-S18 (A, B*, C*, D) was duplicated from the ensemble performance table to simplify comparisons. In cases where multiple data splits are denoted, this refers to an ensemble of multiple models, one per data split (e.g. CAFormer-S18 (A, B*, C*, D) refers to an ensemble of four CAFormer-S18 models, one trained on data split A, another on split B with random erasing, a third on split C with random erasing, and a final with split D).</figDesc><table><row><cell cols="2">Models (data split) ArcFace</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell>Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell>CAFormer-S18 (A, B*, C*, D)</cell><cell>-</cell><cell>81.2</cell><cell>945</cell><cell>33.35</cell><cell>79.58</cell><cell>2557</cell><cell>30.29</cell></row><row><cell>CAFormer-S18 (B*, C*, D) + CAFormer-S18 (A w/ ArcFace)</cell><cell></cell><cell>81.09</cell><cell>952</cell><cell>33.22</cell><cell>79.18</cell><cell>2636</cell><cell>29.73</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Balanced focal loss with higher dropout rate. In all cases, the models are CAFormer-S18 trained with venom loss using data split D. Inference was performed at a resolution of 768. Private leaderboard results omitted where unavailable. Track1, Track2, Public, and Private are abbreviated Trk1, Trk2, Pub, and Priv respectively.</figDesc><table><row><cell>Models (data split)</cell><cell>loss</cell><cell>dropout</cell><cell>Pub Trk1↑</cell><cell>Pub Trk2↓</cell><cell>Pub F1↑</cell><cell>Priv Trk1↑</cell><cell>Priv Trk2↓</cell><cell>Priv F1↑</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>seesaw</cell><cell>0.2</cell><cell cols="6">78.24 1125 27.50 76.95 2977 25.68</cell></row><row><cell cols="2">CAFormer-S18 (D) balanced focal</cell><cell>0.2</cell><cell>74.45</cell><cell>1361</cell><cell>20.91</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell cols="2">CAFormer-S18 (D) balanced focal</cell><cell>0.4</cell><cell>74.66</cell><cell>1353</cell><cell>21.92</cell><cell>73.4</cell><cell>3486</cell><cell>18.87</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Metaformer-0 vs CAFormer-S18. Both models were trained with seesaw loss and venom loss on data split D. Both models used an inference image resolution of 384. No test time augmentations were used by either model. CAFormer-S18 outperforms Metaformer-0 in every metric except private leaderboard Track 2.</figDesc><table><row><cell>Models (data split)</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell>Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>76.16</cell><cell>1251</cell><cell>23.29</cell><cell>73.73</cell><cell>3558</cell><cell>20.26</cell></row><row><cell>Metaformer-0 (D)</cell><cell>74.53</cell><cell>1358</cell><cell>21.38</cell><cell>73.15</cell><cell>3554</cell><cell>18.51</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 Addition</head><label>5</label><figDesc></figDesc><table><row><cell>768</cell></row></table><note>of class weight to venom loss. All models were trained with seesaw loss and venom loss and used multi-instance averaging at inference. Two different combinations of data split, horizontal flipping ("hflip"), and image size are shown with different row colors denoting each. The best results for each combination of data split, image size, and hflip are shown in bold. In both the case of image size</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>41 1052 29.74 78.32 2729 26.18</head><label></label><figDesc></figDesc><table><row><cell>CAFormer-S18 (A)</cell><cell></cell><cell></cell><cell>576</cell><cell>77.91</cell><cell>1149</cell><cell>27.51 75.67</cell><cell>3191</cell><cell>23.08</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>-</cell><cell>-</cell><cell>768</cell><cell>78.</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>24 1125 27.50 76.95 2977 25.68</head><label></label><figDesc></figDesc><table><row><cell>CAFormer-S18 (D)</cell><cell>-</cell><cell>768</cell><cell>76.28</cell><cell>1248</cell><cell>24.21 74.89</cell><cell>3273</cell><cell>20.60</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 7</head><label>7</label><figDesc>Averaging test-time augmentations. All models were trained with seesaw loss and venom loss. An image resolution of 576 is used for inference in all cases. Row color is used to differentiate data split and random erasing combinations. Best performance for each metric is in bold per data split and random erasing combination. * indicates that the model was trained with random erasing. Track1, Track2, Public, and Private are abbreviated Trk1, Trk2, Pub, and Priv respectively.</figDesc><table><row><cell cols="2">(data split)</cell><cell cols="2">Public Track1↑</cell><cell cols="2">Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell cols="2">Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell cols="2">CAFormer-S18 (A*, C*)</cell><cell></cell><cell>79.17</cell><cell></cell><cell>1073</cell><cell>30.19</cell><cell>77.55</cell><cell>2901</cell><cell>26.74</cell></row><row><cell cols="2">CAFormer-S18 (B*, C*)</cell><cell></cell><cell>80.2</cell><cell></cell><cell>1000</cell><cell>30.62</cell><cell>78.11</cell><cell>2798</cell><cell>27.64</cell></row><row><cell cols="3">CAFormer-S18 (A*, B*, C*)</cell><cell>80.78</cell><cell></cell><cell>965</cell><cell>31.76</cell><cell>78.42</cell><cell>2743</cell><cell>27.87</cell></row><row><cell cols="3">CAFormer-S18 (A, B*, C*, D)</cell><cell>81.2</cell><cell></cell><cell>945</cell><cell>33.35</cell><cell>79.58</cell><cell>2557</cell><cell>30.29</cell></row><row><cell cols="3">CAFormer-S18 (B*, C*, D) + CAFormer-S36 (D)</cell><cell>81.07</cell><cell></cell><cell>954</cell><cell>33.28</cell><cell>79.96</cell><cell>2481</cell><cell>30.2</cell></row><row><cell cols="2">CAFormer-S36 (D)</cell><cell></cell><cell>79.95</cell><cell></cell><cell>1013</cell><cell>29.69</cell><cell>79.18</cell><cell>2607</cell><cell>28.23</cell></row><row><cell cols="2">Models (data split) hflip</cell><cell>multi-crop</cell><cell cols="2">multi-instance</cell><cell>Pub Trk1↑</cell><cell>Pub Trk2↓</cell><cell>Pub F1↑</cell><cell>Priv Trk1↑</cell><cell>Priv Trk2↓</cell><cell>Priv F1↑</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell></cell><cell>78.39</cell><cell>1109</cell><cell cols="2">26.75 77.43</cell><cell>2857</cell><cell>25.02</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>-</cell><cell></cell><cell></cell><cell></cell><cell cols="5">79.94 1024 31.23 78.05 2782 27.08</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>-</cell><cell>-</cell><cell></cell><cell></cell><cell>79.87</cell><cell>1025</cell><cell cols="2">30.57 77.71</cell><cell>2825</cell><cell>26.26</cell></row><row><cell>CAFormer-S18 (D)</cell><cell></cell><cell>-</cell><cell></cell><cell></cell><cell cols="4">79.92 1023 30.89 77.88</cell><cell>2798</cell><cell>26.60</cell></row><row><cell>CAFormer-S18 (A)</cell><cell>-</cell><cell>-</cell><cell></cell><cell></cell><cell>79.19</cell><cell>1065</cell><cell cols="2">29.22 77.90</cell><cell>2807</cell><cell>26.07</cell></row><row><cell>CAFormer-S18 (A)</cell><cell></cell><cell>-</cell><cell></cell><cell></cell><cell cols="5">79.41 1052 29.74 78.32 2729 26.18</cell></row><row><cell>CAFormer-S18 (C*)</cell><cell>-</cell><cell>-</cell><cell></cell><cell></cell><cell>78.06</cell><cell>1133</cell><cell cols="2">26.80 76.26</cell><cell>3088</cell><cell>24.34</cell></row><row><cell>CAFormer-S18 (C*)</cell><cell></cell><cell>-</cell><cell></cell><cell></cell><cell>78.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>28 1118 27.09 76.19 3101 24.32 Table 8</head><label></label><figDesc>Inference resolution. The same model is used with different inference image resolutions (image size). The model was trained with seesaw loss and venom loss. No additional test time augmentations were performed (horizontal flip averaging, multiple crop averaging, multi-instance averaging). An image resolution of 576 dramatically outperforms 384, whereas it only slightly outperforms 768 on Track 1 and Track 2 metrics. An image resolution of 768 achieves the best F1 score of the three resolutions.</figDesc><table><row><cell>Models (data split)</cell><cell>image size</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell>Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>384</cell><cell>76.16</cell><cell>1251</cell><cell>23.29</cell><cell>73.73</cell><cell>3558</cell><cell>20.26</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>576</cell><cell>78.39</cell><cell>1109</cell><cell>26.75</cell><cell>77.43</cell><cell>2857</cell><cell>25.02</cell></row><row><cell>CAFormer-S18 (D)</cell><cell>768</cell><cell>78.24</cell><cell>1125</cell><cell>27.50</cell><cell>76.95</cell><cell>2977</cell><cell>25.68</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>Table 9</head><label>9</label><figDesc>Random erasing. All models were trained with seesaw loss and venom loss and utilized multi-instance averaging and image size 576 at inference. Despite slight performance improvements on a local validation set, random erasing appears to harm performance on the public and private leaderboards. * indicates that the model was trained with random erasing (RE).</figDesc><table><row><cell cols="2">Models (data split) *RE</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell>Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell>CAFormer-S18 (A)</cell><cell>-</cell><cell>79.19</cell><cell>1065</cell><cell>29.22</cell><cell>77.9</cell><cell>2807</cell><cell>26.07</cell></row><row><cell>CAFormer-S18 (A*)</cell><cell></cell><cell>78.32</cell><cell>1121</cell><cell>27.99</cell><cell>75.59</cell><cell>3239</cell><cell>24.31</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_10"><head>Table 10</head><label>10</label><figDesc>Different data splits. All models were trained with seesaw loss and venom loss and utilized multi-instance averaging and image size 576. All the models were trained with random erasing.</figDesc><table><row><cell>Models (data split)</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell><cell>Private Track1↑</cell><cell>Private Track2↓</cell><cell>Private F1↑</cell></row><row><cell>CAFormer-S18 (A*)</cell><cell>78.32</cell><cell>1121</cell><cell>27.99</cell><cell>75.59</cell><cell>3239</cell><cell>24.31</cell></row><row><cell>CAFormer-S18 (B*)</cell><cell>79.47</cell><cell>1049</cell><cell>29.86</cell><cell>77.25</cell><cell>2928</cell><cell>26.43</cell></row><row><cell>CAFormer-S18 (C*)</cell><cell>78.06</cell><cell>1133</cell><cell>26.80</cell><cell>76.26</cell><cell>3088</cell><cell>24.34</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_11"><head>Table 11</head><label>11</label><figDesc>Multi-res vs larger ensemble. The 4-model ensemble is duplicated from the ensemble performance table to facilitate a simpler comparison. All models were trained with seesaw loss and venom loss. Horizontal flipping, multi-instance averaging test-time augmentations were applied. At a similar compute budget, a larger ensemble outperforms multi-res. Track1, Track2, Public, and Private are abbreviated Trk1, Trk2, Pub, and Priv respectively. Multiple data splits are indicated per experiment. Each case refers to an ensemble of models. For example, "CAFormer-S18 (C*, D)" denotes an ensemble comprising two CAFormer-S18 models: one trained on data split C with random erasing and another trained on data split D without random erasing. * indicates that the model was trained with random erasing.</figDesc><table><row><cell cols="2">Models (data split) image size</cell><cell>Pub Trk1↑</cell><cell>Pub Trk2↓</cell><cell>Pub F1↑</cell><cell>Priv Trk1↑</cell><cell>Priv Trk2↓</cell><cell>Priv F1↑</cell></row><row><cell>CAFormer-S18 (A, B*, C*, D)</cell><cell>576</cell><cell>81.2</cell><cell>945</cell><cell cols="4">33.35 79.58 2557 30.29</cell></row><row><cell>CAFormer-S18 (C*, D)</cell><cell>576, 652</cell><cell>80.41</cell><cell>998</cell><cell cols="2">32.65 78.56</cell><cell>2712</cell><cell>28.06</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_12"><head>Table 12</head><label>12</label><figDesc>LogitNorm ablation. The addition of LogitNorm does not appear to improve the performance on any metric. Both models have image resolution 576 but no other test-time augmentations. Both models were trained with random erasing. Private leaderboard results unavailable.</figDesc><table><row><cell cols="2">Models (data split) LogitNorm</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell></row><row><cell>CAFormer-S18 (A*)</cell><cell></cell><cell>77.14</cell><cell>1190</cell><cell>25.14</cell></row><row><cell>CAFormer-S18 (A*)</cell><cell>-</cell><cell>77.67</cell><cell>1155</cell><cell>25.71</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_13"><head>Table 13</head><label>13</label><figDesc>Venom loss ablation. The addition of venom loss significantly improves the performance of the models across all metrics. Both models have image resolution 576 but no other test-time augmentations. Both models were trained with random erasing. The same baseline model results are shown in Table12for ease of comparison. Private leaderboard results unavailable.</figDesc><table><row><cell cols="2">Models (data split) Venom loss</cell><cell>Public Track1↑</cell><cell>Public Track2↓</cell><cell>Public F1↑</cell></row><row><cell>CAFormer-S18 (A*)</cell><cell></cell><cell>77.14</cell><cell>1190</cell><cell>25.14</cell></row><row><cell>CAFormer-S18 (A*)</cell><cell>-</cell><cell>74.46</cell><cell>1375</cell><cell>23.17</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_14"><head>Table 14</head><label>14</label><figDesc>Public leaderboard performance for teams with selected models. My models (bold) achieve 2nd place in all metrics.</figDesc><table><row><cell>Rank</cell><cell>Team Name</cell><cell cols="2">Track1↑ Track2↓</cell><cell>F1↑</cell></row><row><cell>1</cell><cell>upupup</cell><cell>85.63</cell><cell>687</cell><cell>43.66</cell></row><row><cell>2</cell><cell>jack-etheredge</cell><cell>81.2</cell><cell>945</cell><cell>33.35</cell></row><row><cell>3</cell><cell>ZCU-KKY</cell><cell>69.92</cell><cell>1660</cell><cell>15.44</cell></row><row><cell>4</cell><cell>Autohome AI</cell><cell>59.11</cell><cell>2431</cell><cell>11.59</cell></row><row><cell>Table 15</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="5">Private leaderboard performance for teams with selected models. My models (bold) achieve 2nd place</cell></row><row><cell>in all metrics.</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Rank</cell><cell>Team Name</cell><cell cols="2">Track1↑ Track2↓</cell><cell>F1↑</cell></row><row><cell>1</cell><cell>upupup</cell><cell>83.57</cell><cell>1840</cell><cell>34.58</cell></row><row><cell>2</cell><cell>jack-etheredge</cell><cell>79.58</cell><cell>2557</cell><cell>30.29</cell></row><row><cell>3</cell><cell>ZCU-KKY</cell><cell>67</cell><cell>4611</cell><cell>13.29</cell></row><row><cell>4</cell><cell>Autohome AI</cell><cell>54.15</cell><cell>7063</cell><cell>9.22</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The author would like to thank Jillian Etheredge for constructive criticism of the manuscript.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Identifying the snake: First scoping review on practices of communities and healthcare providers confronted with snakebite across the world</title>
		<author>
			<persName><forename type="first">I</forename><surname>Bolon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Durso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Mesa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Alcoba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Chappuis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R D</forename><surname>Castañeda</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0229989</idno>
		<ptr target="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0229989.doi:10.1371/journal.pone.0229989" />
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">e0229989</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
		<respStmt>
			<orgName>Public Library of Science</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Snakebite and snake identification: empowering neglected communities and health-care providers with AI, The Lancet</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>De Castañeda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Durso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Alcoba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Chappuis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Salathé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bolon</surname></persName>
		</author>
		<idno type="DOI">10.1016/S2589-7500(19)30086-X</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Health</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="e202" to="e203" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of SnakeCLEF 2024: Revisiting snake species identification in medically important scenarios</title>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Durso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of LifeCLEF 2024: Challenges on species distribution prediction and identification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Goëau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Espitalier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Botella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Marcos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Estopinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Leblanc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Larcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hrúz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</author>
		<title level="m">Overview of FungiCLEF 2024: Revisiting fungi species recognition beyond 0-1 cost</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Overview of SnakeCLEF 2023: Snake Identification in Medically Important Scenarios</title>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chamidullin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hrúz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Durso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chamidullin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</author>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Overview of FungiCLEF 2023: Fungi Recognition Beyond 1/0 Cost</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">1st Place Solution for FungiCLEF 2022 Competition: Fine-grained Open-set Fungi Recognition</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ruan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Han</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2022 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Metaformer Model with ArcFaceLoss and Contrastive Learning for SnakeCLEF2023 Fine-Grained Classification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Qiu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Watch out Venomous Snake Species: A Solution to SnakeCLEF2023</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-S</forename><surname>Wei</surname></persName>
		</author>
		<idno>arXiv:</idno>
		<ptr target="2307.09748[cs" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Entropy-guided open-set fine-grained fungi recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:264441405" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Combination of Image and Location Information for Snake Species Identification using Object Detection and EfficientNets</title>
		<author>
			<persName><forename type="first">L</forename><surname>Bloch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Boketta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Keibel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mense</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Michailutschenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pelka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rückert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Willemeit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Friedrich</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:225071467" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Seesaw Loss for Long-Tailed Instance Segmentation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Loy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2008.10032</idno>
		<idno type="arXiv">arXiv:2008.10032</idno>
		<ptr target="http://arxiv.org/abs/2008.10032.doi:10.48550/arXiv.2008.10032" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wookey</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2019.2962617</idno>
		<ptr target="https://ieeexplore.ieee.org/document/8943952/.doi:10.1109/ACCESS.2019.2962617" />
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="4806" to="4813" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Focal Loss for Dense Object Detection</title>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Girshick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dollár</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1708.02002</idno>
		<idno type="arXiv">arXiv:1708.02002</idno>
		<ptr target="http://arxiv.org/abs/1708.02002.doi:10.48550/arXiv.1708.02002" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>version: 2</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Arcface: Additive angular margin loss for deep face recognition</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zafeiriou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A Simple Framework for Contrastive Learning of Visual Representations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kornblith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Norouzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<ptr target="iSSN:2640-3498" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th International Conference on Machine Learning</title>
				<meeting>the 37th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1597" to="1607" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A convnet for the 2020s</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Feichtenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Darrell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<publisher>CVPR</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">MetaFormer: A Unified Meta Framework for Fine-Grained Recognition</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Diao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yuan</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2203.02751</idno>
		<idno type="arXiv">arXiv:2203.02751</idno>
		<ptr target="http://arxiv.org/abs/2203.02751.doi:10.48550/arXiv.2203.02751" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">G</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2103.10158</idno>
		<idno type="arXiv">arXiv:2103.10158</idno>
		<ptr target="http://arxiv.org/abs/2103.10158.doi:10.48550/arXiv.2103.10158" />
		<title level="m">TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Random Erasing Data Augmentation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1708.04896</idno>
		<idno type="arXiv">arXiv:1708.04896</idno>
		<ptr target="http://arxiv.org/abs/1708.04896.doi:10.48550/arXiv.1708.04896" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cisse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lopez-Paz</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1710.09412.arXiv:1710.09412" />
		<title level="m">mixup: Beyond empirical risk minimization</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features</title>
		<author>
			<persName><forename type="first">S</forename><surname>Yun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Choe</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICCV.2019.00612</idno>
		<ptr target="https://ieeexplore.ieee.org/document/9008296/.doi:10.1109/ICCV.2019.00612" />
	</analytic>
	<monogr>
		<title level="m">2019 IEEE/CVF International Conference on Computer Vision (ICCV)</title>
				<meeting><address><addrLine>Place; Seoul, Korea (South</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="6022" to="6031" />
		</imprint>
	</monogr>
	<note>IEEE/CVF International Conference on Computer Vision (ICCV)</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">RandoMix: a mixed sample data augmentation method with multiple mixed modes</title>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Nie</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11042-024-18868-8</idno>
		<ptr target="https://link.springer.com/10.1007/s11042-024-18868-8.doi:10.1007/s11042-024-18868-8" />
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zafeiriou</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-58621-8_43</idno>
		<ptr target="https://link.springer.com/10.1007/978-3-030-58621-8_43.doi:10.1007/978-3-030-58621-8_43" />
	</analytic>
	<monogr>
		<title level="m">Sub-center ArcFace: Boosting Face Recognition by Large-Scale Noisy Web Faces</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="volume">12356</biblScope>
			<biblScope unit="page" from="741" to="757" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Mitigating Neural Network Overconfidence with Logit Normalization</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2205.09310</idno>
		<idno type="arXiv">arXiv:2205.09310</idno>
		<ptr target="http://arxiv.org/abs/2205.09310.doi:10.48550/arXiv.2205.09310" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">OpenWGAN-GP for Fine-Grained Open-Set Fungi Classification</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">N</forename><surname>Etheredge</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Decoupled Weight Decay Regularization</title>
		<author>
			<persName><forename type="first">I</forename><surname>Loshchilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1711.05101</idno>
		<idno type="arXiv">arXiv:1711.05101</idno>
		<ptr target="http://arxiv.org/abs/1711.05101.doi:10.48550/arXiv.1711.05101" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>cs, math</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">ImageNet: A large-scale hierarchical image database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2009.5206848</idno>
		<ptr target="https://ieeexplore.ieee.org/document/5206848.doi:10.1109/CVPR.2009.5206848" />
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<date type="published" when="1063">2009. 2009. 1063-6919</date>
			<biblScope unit="page" from="248" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Van Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">Mac</forename><surname>Aodha</surname></persName>
		</author>
		<ptr target="https://kaggle.com/competitions/inaturalist-2021" />
		<title level="m">iNat Challenge 2021 -FGVC8</title>
				<imprint>
			<publisher>Kaggle</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
