<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">OpenWGAN-GP for Fine-Grained Open-Set Fungi Classification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jack</forename><forename type="middle">N</forename><surname>Etheredge</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Twosense</orgName>
								<address>
									<settlement>New York</settlement>
									<region>New York</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">OpenWGAN-GP for Fine-Grained Open-Set Fungi Classification</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">A2874613F8501C63FE2AE682A014BF7B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>OpenWGAN-GP</term>
					<term>OpenGAN</term>
					<term>Open-set recognition</term>
					<term>Fine-grained classification</term>
					<term>FungiCLEF</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Understanding and accurately classifying fungi is crucial for ecological studies, food safety, and public health. In this paper, I present my approach to the FungiCLEF 2024 challenge, which aims to classify images of fungi, identify open-set "unknown" fungi species in the test data, and reduce the confusion between edible and poisonous fungi. This method leverages a combination of Metaformer-0, Metaformer-2, and CAFormer-S18 models, chosen for their strong classification performance relative to their computational efficiency. The Metaformer-0 and Metaformer-2 models utilize metadata, while CAFormer-S18 does not, yet all belong to the same family of models known as Metaformers and employ convolutional blocks followed by multi-headed self-attention transformer blocks. My primary novel contribution is the application of OpenGAN to detect unknown fungi species, enhanced by incorporating WGAN-GP to improve training stability, resulting in a new open-set classifier training paradigm I term OpenWGAN-GP. This approach enables a lightweight discriminator to utilize the latent representations from the closed-set classifier for binary classification of open-set vs. closed-set species. My best-performing ensemble achieved public leaderboard scores of 0.2394 for Track 1, 0.1681 for Track 2, and 0.4075 for Track 3, along with a macro-averaged F1 score of 49.81%. Track 1 represents the classification loss with unknowns, Track 2 represents the edible-poisonous confusion loss (weighted heavily for poisonous to edible misclassifications), and Track 3 is the sum of Track 1 and Track 2. My method secured 1st place in the FungiCLEF 2024 competition for Track 1, F1, and Accuracy on the private leaderboard. Code is available at https://github.com/Jack-Etheredge/fungiclef2024.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Correctly identifying mushroom species and distinguishing between poisonous and edible varieties are critical for public health. In 2023 alone China had 1,303 reported cases of mushroom poisonings traced to 97 species of mushroom, of which 12 were newly discovered species <ref type="bibr" target="#b0">[1]</ref>. FungiCLEF 2024 <ref type="bibr" target="#b1">[2]</ref> is a competition held as part of the LifeCLEF 2024 <ref type="bibr" target="#b2">[3]</ref> lab 1 . FungiCLEF is a long-tailed fine-grained open-set classification task with an additional asymmetrically weighted edible-poisonous confusion component. In this work, I propose a novel solution for fine-grained classification of fungi species that simultaneously minimizes confusion between edible and poisonous species as well as detecting species of fungi unknown to the training dataset. The primary contributions of this work are 1) the use of an open-set recognition classifier trained using the embeddings from the closed-set classifier applied to fine-grained open-set recognition as well as 2) leveraging an ensemble of computationally lean models with carefully selected test-time augmentations. Extensive experimentation was used to improve the training methodology of this discriminator. I refer to this optimized open-set classifier training paradigm as OpenWGAN-GP.</p><p>CLEF 2024: Conference and Labs of the Evaluation Forum, September 09-12, 2024, Grenoble, France * Corresponding author. Envelope jack.etheredge@gmail.com (J. N. Etheredge) GLOBE https://github.com/Jack-Etheredge (J. N. Etheredge) Orcid 0000-0001-5467-3866 (J. N. Etheredge) is a modification to standard cross-entropy loss. Given predicted logits 𝑧 and predicted probabilities 𝜎 from the classifier, and 𝑦 𝑖 is the one-hot encoded ground truth label with 1 &lt;= 𝑖 &lt;= 𝐶, seesaw loss is defined as</p><formula xml:id="formula_0">𝐿 seesaw (𝑧) = − 𝐶 ∑ 𝑖=1 𝑦 𝑖 log(𝜎 𝑖 ),<label>(1)</label></formula><p>over classes 𝐶 where 𝜎 𝑖 is defined as</p><formula xml:id="formula_1">𝜎 𝑖 = 𝑒 𝑧 𝑖 ∑ 𝐶 𝑗≠𝑖 𝑆 𝑖𝑗 𝑒 𝑧 𝑗 + 𝑒 𝑧 𝑖 . (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>𝑆 𝑖𝑗 is a balancing coefficient between different classes. 𝑆 𝑖𝑗 is determined by a combination of a mitigation factor 𝑀 𝑖𝑗 and a compensation factor 𝐶 𝑖𝑗 :</p><formula xml:id="formula_3">𝑆 𝑖𝑗 = 𝑀 𝑖𝑗 ⋅ 𝐶 𝑖𝑗<label>(3)</label></formula><p>𝑀 𝑖𝑗 mitigates the penalty on tail classes based on their instance ratio compared to head classes by decreasing the penalty on class 𝑗 relative to the ratio of instance counts between the less abundant tail class 𝑗 and the more abundant head class 𝑖. Conversely, 𝐶 𝑖𝑗 increases the penalty on class 𝑗 whenever a misclassification occurs from class 𝑖 to class 𝑗. This dual-factor approach in 𝑆 𝑖𝑗 allows Seesaw loss to dynamically adjust penalties based on both instance distribution and misclassification behavior, optimizing the learning process in long-tailed multi-class classification tasks. The loss function is explained in greater detail in the original paper <ref type="bibr" target="#b19">[20]</ref>.</p><p>The closed-set image classification models used to classify the images belong to a family of hybrid convolutional and self-attention transformer models known as Metaformers <ref type="bibr" target="#b20">[21]</ref>. These models are explained in detail in Section 3.5.</p><p>The other challenge is the recognition of the open-set "unknown" class. OpenWGAN-GP was used to classify images as belonging to the closed-set or open-set datasets. The architecture of the open-set discriminator and the OpenWGAN-GP training methodology are described in greater detail in Section 3.6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Evaluation metrics</head><p>The public leaderboard for the competition reported multiple metrics for each submission. Track 1 was a classification loss that included unknowns. Track 2 was an edible-poisonous confusion loss with a ×100 weight for poisonous → edible misclassification. Track 3 was the sum of the Track 1 and Track 2 losses. Additionally, the macro-averaged F1 score and accuracy were reported. The accuracy has been ignored for all experimental results reported here. Apart from the macro-averaged F1 score, none of the metrics correct for class imbalance. Thus the impact of classification performance on each class impacts final performance for Tracks 1-3 proportional to the number of observations for that class.</p><p>Track 1 loss is a standard classification error with an additional "unknown" class:</p><formula xml:id="formula_4">𝐿 1 = ∑ 𝑖 𝑊 1 (𝑦 𝑖 , 𝑞(𝑥 𝑖 )),<label>(4)</label></formula><p>for class predictions 𝑞(𝑥) for observations 𝑥 from a classifier 𝑞 and true labels 𝑦. The cost function 𝑊 1 is defined as</p><formula xml:id="formula_5">𝑊 1 (𝑦, 𝑞(𝑥)) = { 0 if 𝑞(𝑥) = 𝑦 1 otherwise . (<label>5</label></formula><formula xml:id="formula_6">)</formula><p>Track 2 loss penalizes the confusion of edible and poisonous species. Consider a function 𝑝 that indicates poisonous species as 𝑝(𝑦) = 1 if species 𝑦 is poisonous, and 𝑝(𝑦) = 0 otherwise. Let 𝑐 𝑃𝑆𝐶 denote the cost for poisonous → edible misclassification (a poisonous observation was predicted as edible) and 𝑐 𝐸𝑆𝐶 the cost for edible → poisonous misclassfication. 𝑐 𝐸𝑆𝐶 = 1 and 𝑐 𝑃𝑆𝐶 = 100. Track 2 loss is defined as:</p><formula xml:id="formula_7">𝐿 2 = ∑ 𝑖 𝑊 2 (𝑦 𝑖 , 𝑞(𝑥 𝑖 )),<label>(6)</label></formula><p>for class predictions 𝑞(𝑥) for observations 𝑥 from a classifier 𝑞 and true labels 𝑦 as in 𝐿 1 . The cost function 𝑊 2 is defined as</p><formula xml:id="formula_8">𝑊 2 (𝑦, 𝑞(𝑥)) = ⎧ ⎨ ⎩ 0 if 𝑝(𝑦) = 𝑝(𝑞(𝑥)) 𝑐 𝑃𝑆𝐶 if 𝑝(𝑦) = 1 and 𝑝(𝑞(𝑥)) = 0 𝑐 𝐸𝑆𝐶 otherwise (7)</formula><p>Track 3 (the "user-focused loss") is simply the sum of Track 1 and Track 2 losses:</p><formula xml:id="formula_9">𝐿 3 = 𝐿 1 + 𝐿 2 .<label>(8)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Custom poison loss</head><p>A custom poison loss was used for all of the final models. The poison loss was formulated as a class weighted binary cross entropy loss.</p><p>Given the set of poisonous classes 𝑃 and the set of edible classes 𝜀, sum the probabilities of each class independently where 𝑃 𝑖 is the softmax probability output for class 𝑖.</p><formula xml:id="formula_10">𝑃 𝑝𝑜𝑖𝑠𝑜𝑛𝑜𝑢𝑠 = ∑ 𝑖∈𝑃 𝑃 𝑖 .<label>(9)</label></formula><formula xml:id="formula_11">𝑃 𝑒𝑑𝑖𝑏𝑙𝑒 = ∑ 𝑖∈𝜀 𝑃 𝑖 .<label>(10)</label></formula><p>Let 𝑦 ∈ {0, 1} be the binary ground truth label for an image, where 1 indicates a poisonous class and 0 indicates an edible class. 𝛼 = 100 is the weight assigned to the edible class to penalize edible → poisonous misclassifications. Thus, the weighted cross-entropy loss is as follows:</p><formula xml:id="formula_12">𝐿 𝑝𝑜𝑖𝑠𝑜𝑛,𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = −[𝑦 log(𝑃 𝑝𝑜𝑖𝑠𝑜𝑛𝑜𝑢𝑠 ) + 𝛼(1 − 𝑦) log(𝑃 𝑒𝑑𝑖𝑏𝑙𝑒 )]<label>(11)</label></formula><p>Since the softmax probabilities output by the model sum to 1, the probabilities for all the poisonous classes and all the edible classes were summed independently and used as the prediction for the binary cross entropy criterion. A weight of 100 was assigned to the edible class, since edible → poisonous misclassifications (true label is edible, predicted label is poisonous) was penalized ×100 in the Track 2 loss. The total training loss was the sum of the seesaw loss and the custom poison loss. This approach ensures that the training process emphasizes correctly classifying edible species as edible, thus reducing the risk of mistakenly classifying edible species as poisonous, which is heavily penalized in the evaluation metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Model architectures</head><p>All experiments were performed on a machine with a single NVIDIA RTX 3090 graphics card and all models were trained using PyTorch <ref type="bibr" target="#b21">[22]</ref>. Given that I was working on a single computer for this competition, efficient use of the limited compute available was of critical importance. Ensembling and test-time augmentations were used to increase performance while keeping training efficiency in check. An ensemble of computationally lean models have been shown to outperform a single larger model with respect to both training and inference cost <ref type="bibr" target="#b22">[23]</ref>. Model architectures were chosen based on their performance on ImageNet and/or iNaturalist relative to the computational complexity of the models in TFLOPs, aiming for a final ensemble of at least two models. Test time augmentations allow for much greater performance without requiring any additional training of the models, making them a particularly attractive target for optimization when compute and time are both limited.</p><p>Metaformers <ref type="bibr" target="#b20">[21]</ref> are a family of models that combine different tokenizers with a transformer backbone. A collection of Metaformer models (Metaformer-0, Metaformer-1, and Metaformer-2) were created in <ref type="bibr" target="#b3">[4]</ref> that combine metadata with the images to improve the classification performance of the models on multiple fine-grained image datasets. CAFormer <ref type="bibr" target="#b23">[24]</ref> models are very similar in architecture to the Metaformer variants proposed in <ref type="bibr" target="#b3">[4]</ref>, but do not make use of metadata. In the final ensemble, Metaformer-0, Metaformer-2, and CAFormer-S18 were used. Hereafter Metaformer refers to the Metaformer models from <ref type="bibr" target="#b3">[4]</ref> which incorporate metadata information.</p><p>I fine-tuned CAFormer-S18 with weights pretrained on ImageNet-21K <ref type="bibr" target="#b24">[25]</ref> while Metaformer-0 and Metaformer-2 models were pretrained on iNaturalist2021 <ref type="bibr" target="#b4">[5]</ref>. CAFormer-S18 was fine-tuned on a different train-validation split than the two Metaformer models. Metaformer-0 and Metaformer-2 differ only in the number of channels in the convolutional and transformer blocks, with Metaformer-2 having more channels in every block. The S18 variant of CAFormer refers to a specific combination of convolutional and self-attention token mixers. CAFormer-S18 utilizes a total of 18 blocks: 3 convolution blocks with 64 channels, 3 convolution blocks with 128 channels, 9 attention blocks with 320 channels, and 3 attention blocks with 512 channels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">OpenGAN and OpenWGAN-GP</head><p>To my knowledge, this is the first time that OpenGAN has been utilized for open set recognition of fine-grained images beyond digit recognition. In order to improve training stability, I incorporated the Wasserstein GAN loss and gradient penalty (WGAN-GP) <ref type="bibr" target="#b25">[26]</ref> into the training of OpenGAN <ref type="bibr" target="#b12">[13]</ref> to create OpenWGAN-GP. In addition to incorporating WGAN-GP, batch normalization layers were replaced with layer normalization layers for the discriminator as suggested in <ref type="bibr" target="#b25">[26]</ref>.</p><p>OpenGAN proposes selection of the discriminator against a validation set of open-and closed-set examples, selecting the discriminator with the best validation ROC-AUC. However, since the ROC-AUC is calculated using a range of classification thresholds, the best classification threshold would also need to be determined for each OpenGAN discriminator. As such, the macro-averaged F1 was used as the selection metric instead of ROC-AUC for OpenWGAN-GP. Additionally, if the OpenWGAN-GP discriminator is selected based on macro-F1, an ensemble can be averaged without the need to calibrate the classification threshold.</p><p>Another difference from the original OpenGAN paper is that models were selected based on their macro-F1 performance rather than the proposed ROC-AUC. Since models would be used in an ensemble and the OpenWGAN-GP probabilities would be averaged, it was important to assume that the classification threshold for all of the individual OpenWGAN-GP models would be the same. An alternative strategy would have been voting, which would have allowed for different classification thresholds per model, but this was not explored in this study.</p><p>OpenGAN is a methodology for training a lightweight discriminator that utilizes the intermediate representation of an image to generate a binary classification for open-set recognition. Several related methods were proposed, but the one that performed best in their experiments and the one that I focus on in this work is OpenGAN fea with the inclusion of open-set training data, which I will simply refer to as OpenGAN. This paradigm allows for training a classification without initial consideration of the open set data. The discriminator is a multilayer perceptron consisting of fully connected layers with sizes 𝐷 → 𝐻 × 8 → 𝐻 × 4 → 𝐻 × 2 → 𝐻 → 1, where 𝐷 represents the dimension of the intermediate representation from the closed-set classifier and 𝐻 is a hidden dimension multiplier. 𝐻 = 64 unless otherwise specified. The output layer uses a sigmoid activation function. Batch normalization <ref type="bibr" target="#b26">[27]</ref> and LeakyRELU <ref type="bibr" target="#b27">[28]</ref> are used between each dense layer. During training, the generator generates a feature vector of length 𝐷 from a 100-dimensional input vector with each value sampled independently from a standard normal distribution (mean 0, variance 1). The generator is also a multilayer perceptron with batch normalization and LeakyRELU. It has a similar architecture, but there are some critical differences. The output dimension of the generator must match the input dimension 𝐷  <ref type="bibr" target="#b28">[29]</ref>. As such, the discriminator is updated twice per update of the generator since the discriminator is trained adversarially against the generator (real vs fake) as well as supervised (openvs closed-set). This training paradigm is illustrated in Figure <ref type="figure" target="#fig_0">1</ref>. The Adam optimizer <ref type="bibr" target="#b29">[30]</ref> was used with a learning rate of 1e-4 for the discriminator and 2e-4 for the generator. The higher learning rate is used for the generator to account for the 2:1 updates of the discriminator vs the generator.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7.">Metadata</head><p>Metaformer-0 and Metaformer-2 allow the fusion of metadata information with the vision information. Metadata was utilized to provide the model with information concerning location, local growth conditions, and temporal information by including the country code, substrate, and habitat, and observation date (month and day). Example substrates include "fruits", "wood", "cones", "soil", and "peat mosses", while example habitats include "bog", "dune", "meadow", and "roof". There are 34, 32, and 31 categories for country code, substrate, and habitat, respectively. Metadata was preprocessed for Metaformer-0 and Metaformer-2 according to <ref type="bibr" target="#b16">[17]</ref>. The month and day were transformed by periodic encoding into [sin ( 2𝜋 month</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>12</head><p>) , cos ( 2𝜋 month</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>12</head><p>)] and [sin (</p><formula xml:id="formula_13">2𝜋 day 31 ) , cos ( 2𝜋<label>day</label></formula><p>31 )] respectively to preserve temporal relationships. Geographical information in the form of country codes, habitat, and substrate were all one-hot encoded. Metaformer-0 and Metaformer-2 utilize trainable embeddings to project this encoded metadata to the same dimensionality as the image features in order to fuse them with the latent representation of the images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.8.">Training settings</head><p>I used the AdamW optimizer <ref type="bibr" target="#b30">[31]</ref> with an initial learning rate of 1e-3 on only the classification output dense layer with the pretrained model frozen for the first 5 epochs, then reduced to 5e-5 for subsequent epochs. CAFormer-S18 models were trained with a batch size of 40, while Metaformer-0 models were trained with a batch size of 32, and Metaformer-2 models were trained with a batch size of 12. A weight decay of 0.05 was used for all of these models. The learning rate was reduced by a factor of 0.1 if the model did not improve the validation loss for 5 consecutive epochs. Early stopping was also employed when training all models to prevent wasting compute time on models which were no longer improving in their generalization to the validation set as measured by the validation loss.</p><p>LogitNorm <ref type="bibr" target="#b31">[32]</ref> describes a technique that applies an L2 norm to the logits during training (the norm is not applied during inference). It was included with the hope that it should improve the separation of the classes in the embedding space relative to standard seesaw loss, which might enable OpenWGAN-GP to leverage the embedding space for more accurate open-set recognition. Additionally, LogitNorm was shown to act similarly to temperature scaling <ref type="bibr" target="#b32">[33]</ref> to create models that generate less overconfident predictions. This would be important for maximum softmax probability or entropy thresholding, which were explored as alternatives to OpenWGAN-GP.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.9.">Training data augmentation</head><p>Training was performed with a square random crop, TrivialAugment <ref type="bibr" target="#b33">[34]</ref>, horizontal flip with 50% probability, and GridMask <ref type="bibr" target="#b34">[35]</ref> with a probability of 20%, applied in that order.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.10.">Test-time augmentations</head><p>At test-time, all images were resized with bicubic interpolation along the shortest dimension to 384 or 576 depending on the model followed by a square center crop of the same size. Test-time augmentations and ensembling were instrumental techniques for the final inference performance. Using a larger image size for inference than training was shown to improve accuracy for multiple datasets in FixRes <ref type="bibr" target="#b35">[36]</ref>.</p><p>The strategies employed were averaging horizontal flips, multi-instance averaging, ensemble averaging, and inference at a higher resolution (576x576) for CAFormer-S18 relative to training (384x384). The overall inference pipeline is summarized in Figure <ref type="figure">2</ref>. Fivecrop was also investigated, but could not be incorporated in the allowed compute budget. Since it did not yield as significant an improvement relative to a larger ensemble with horizontal flipping according to local evaluation and public leaderboard scores for Track 3, the chosen configuration was preferred. It could be useful in the future to experiment with ensembling techniques that are more sophisticated than simple averaging, but none were attempted in this study.</p><p>Due to the open set "unknown" class being implicitly edible, the penalty for misclassifying poisonous mushrooms as unknown was greater than the decrease in misclassification loss. To mitigate this in the proposed solution, if the top prediction of the classification network was a poisonous mushroom, the prediction from the OpenGAN open set classifier was ignored for that observation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental results</head><p>My best-performing ensemble and OpenWGAN-GP combinations achieved 1st place on Track 1, F1, and Accuracy on the private leaderboard. All results reported are for the public leaderboard test set unless explicitly stated otherwise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Open-set recognition</head><p>Different open set detection methods were evaluated against the public leaderboard and these results can be seen in Table <ref type="table" target="#tab_0">1</ref>. Experiments with a local validation set showed that temperature scaling improved the performance of maximum softmax probability (MSP) thresholding as well as softmax entropy thresholding (experiments not shown). OpenWGAN-GP consistently performed better on the leaderboard than softmax thresholding or entropy thresholding, even after temperature scaling <ref type="bibr" target="#b32">[33]</ref> the probabilities. As can be seen in Table <ref type="table" target="#tab_0">1</ref>, the performance of either of these methods depends on optimizing a threshold. The optimal entropy threshold for local validation was 6, which did not appear to be optimal for the public leaderboard. This suggests that this method may not generalize well between test sets. OpenWGAN-GP is a binary classifier that is selected using the macro-F1 score with a classification threshold of 0.5, which means that no additional thresholding should be needed to generalize between test sets. Table <ref type="table" target="#tab_0">1</ref> shows that despite not tuning the classification threshold, OpenWGAN-GP shows the best Track 1 and Track 3 performance while maintaining a similar Track 2 performance to MSP and entropy thresholding after avoiding poisonous → unknown misclassification as explained below. It can be seen from the results in Table <ref type="table" target="#tab_0">1</ref> that ignoring OpenWGAN-GP predictions for open-set recognition in the cases when the highest predicted probability belongs to a poisonous species ("ignore poison pred") is critical to preventing the open-set recognition from degrading performance on Track 3. OpenWGAN-GP without ignore poison pred achieves a better Track 1 score than OpenWGAN-GP with ignore poison pred, but a much higher Track 2 score. This suggests that in many cases OpenWGAN-GP is correctly identifying unknowns that the classification network is predicting to be poisonous, but that the poisonous → edible cost for the poisonous closed → open misclassifications overwhelms the improvement in classification loss. This reinforces how challenging it is to simultaneously optimize classification performance, identification of unknown species, and avoidance of misclassifying poisonous species as edible.</p><p>Following <ref type="bibr" target="#b16">[17]</ref>, I explored fine-tuning the models through outlier exposure after first training the models without the inclusion of unknowns, but validation loss failed to improve after the first epoch upon inclusion of unknowns (results not shown). It appears that unknowns were included for the entire duration of training in their work. Unfortunately, I was unable to complete this experiment before the competition concluded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Model architectures and ensemble selection</head><p>Multiple architectures were evaluated for this study. Efficientnet-B0 <ref type="bibr" target="#b36">[37]</ref> and Efficientnetv2-S <ref type="bibr" target="#b37">[38]</ref> were both experimented with but their results were not as promising as Metaformer and CAFormer against a local validation set in early experiments. Results are not shown for these experiments since they are not directly comparable to the experiments reported. CAFormer-S18 has better performance than Metaformer-0 when the image resolution is increased at inference time despite belonging to the same family of models as Metaformer-0 and Metaformer-2, which leverage metadata information. CAFormers performed almost as well as Metaformer despite not utilizing information from the metadata. Future work could evaluate merging the two architectures into a CAFormer with a head for the metadata information. An ensemble of three CAFormer-S18 models that vary only by their training and validation data split performs nearly as well as ensembles of Metaformer-0, Metaformer-2, and CAFormer-S18. The best performing ensemble was Metaformer-0, Metaformer-2, CAFormer-S18 split C. Split C performs better than the other two CAFormer-S18 data splits, which provides a likely explanation as to why this ensemble outperformed Metaformer-0, Metaformer-2, and CAFormer-S18 split A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Optimization of OpenWGAN-GP training</head><p>OpenWGAN-GP training was optimized with respect to the hidden dimension size, ratio of closed and open set samples, and whether training augmentations were applied. Table <ref type="table">3</ref> shows that the best Track 3 performance is achieved when using training augmentations and oversampling the open-set data to roughly the same number of samples as the closed-set data. This case is represented as weighted undersampling (w.u.) for closed-set sampling and "3x all" sampling for open-set sampling, which represents the case of oversampling the open-set dataset completely 3 times with training augmentations to increase the diversity of representations of the limited open-set data. These settings are used for all results shown in Table <ref type="table" target="#tab_1">2</ref> and the OpenWGAN-GP results in Table <ref type="table" target="#tab_0">1</ref>. For sampling the closed-set data, weighted undersampling outperforms random undersampling and balanced undersampling. In cases where there is a &gt;5% disparity between the number of samples in the open and closed sets, training is performed with balanced sampling between the open and closed sets. This pertains to all combinations except the 3x oversampling of the open set and weighted undersampling of the closed set. Local experimentation suggested that using the entire closed set dataset could yield a slight increase in Track 3 performance, but this dramatically increases the training time for the OpenWGAN-GP classifier (experiments not shown). While the Track 3 performance does not appear to be particularly sensitive to the hidden dimension size, the trend suggests that a smaller hidden dimension may have slightly improved performance, as shown in Table <ref type="table">4</ref>. More exhaustive combinations could not be performed due to competition submission limitations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Test-time augmentations</head><p>Several test-time augmentations were evaluated. The performance of each of these augmentations is shown in Table <ref type="table">5</ref>. Multi-instance averaging, averaging of horizontal flips of the same image, averaging of multiple crops of the same image, and averaging multiple image sizes are explored with Metaformer-0 trained on split D. Since Metaformer-0 does not support inference at a different resolution than the training resolution (in this case 384x384), the image size in Table <ref type="table">5</ref> refers to the image resolution of the shorter dimension before a square crop of 384. For example, if the image size is 441, then the image is resized to 441 along the shorter dimension (assuming it is a rectangular image) and then a square center crop of 384 is taken. As such, image size must be at least 384 for Metaformer-0. Each augmentation improves performance individually and in combination. Of the test-time augmentations that were experimented with, multi-instance averaging has the greatest impact of any individual transformation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Leaderboard performance</head><p>Public leaderboard performance is shown in Table <ref type="table" target="#tab_5">10</ref> for teams that have selected models for private leaderboard evaluation. Private leaderboard performance for the selected models for each team are shown in Table <ref type="table" target="#tab_6">11</ref>. The best performance for each metric is independently reported for each team, which means that results for each team may represent distinct solutions for each metric. My models achieved the best performance for Track 1 and accuracy in both the public and private leaderboards and the best F1 for the private leaderboard. My models placed 3rd on the private leaderboard for Track 2 and 2nd for Track 3, indicating that the poisonous → edible misclassification could be improved for the methods presented here. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>The proposed methodology for open-set recognition of fungi species addresses the critical challenge of distinguishing between edible and poisonous mushrooms while effectively identifying unknown species. This study demonstrates the potential of combining Metaformer and CAFormer models to achieve robust classification performance. The integration of metadata in Metaformer models significantly enhances the model's ability to leverage additional contextual information, thereby improving classification accuracy. However, one notable challenge is the current evaluation metrics, which assume unknown species are edible. This assumption may not be ideal if the models are intended to be used for foraging contexts, where new poisonous species of mushrooms are continually discovered. Re-evaluating these metrics to consider unknown species as potentially poisonous could mitigate the tension between open-set classification and the misclassification of poisonous species, thereby enhancing the practical applicability of the models in real-world scenarios. The current structure of the metrics which treats unknown species as edible puts open-set recognition and poisonous species identification in direct opposition with each other since misclassifying a closed-set poisonous species as unknown is heavily penalized, making their joint optimization challenging. If the intention is to build a system which displays both high detection rates for unknown species and high recall for poisonous mushrooms, the high penalty for poisonous → edible misclassification would work in favor of rather than against identification of unknown species if unknowns were assumed poisonous instead of edible. This may ultimately improve the model's performance in both aspects. Intuitively, the open-set should be more diverse and thus sharing the label with the inherently less realistic fake generated data seems the more logical choice in most scenarios when greater diversity is expected to be observed in the open-set data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future work</head><p>Future research should explore the redefinition of evaluation metrics to account for the possibility of unknown species being poisonous. This adjustment could reduce the conflict between optimizing for open-set classification and minimizing poisonous species misclassification. Additionally, investigating more sophisticated ensembling techniques and incorporating advanced data augmentation strategies could further improve model performance. Exploring the use of few-shot learning techniques might address the challenge posed by classes with very few observations. Finally, expanding the application of the proposed OpenWGAN-GP framework to other domains with similar classification challenges could validate its versatility and robustness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusions</head><p>This paper presents a novel method for open-set recognition of fungi species. The integration of WGAN-GP training optimizations into OpenGAN, resulting in OpenWGAN-GP, enhances training stability and enables lightweight discriminators to effectively identify unknown fungi species. An ensemble of Metaformer and CAFormer models is leveraged to classify fungi accurately while avoiding the misclassification of poisonous mushrooms as edible. The application of carefully chosen testtime augmentations, such as image resolution adjustments, horizontal flipping, and multi-instance averaging, dramatically improves classification performance. These techniques collectively contributed to achieving 1st place in the FungiCLEF 2024 competition for Track 1, F1, and Accuracy and 2nd place for the final ranking metric Track 3, which combines edible → poisonous confusion loss Track 2 with standardard misclassification loss including the unknown class Track 1.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: OpenGAN training paradigm. After a closed-set classifier has been trained, an open-set discriminator is trained through both A) supervised learning and B) adversarial training using the penultimate layer feature representation of closed-and open-set observations. A generator is trained adversarially, with a loss that is minimized when the discriminator cannot distinguish between real features and features generated by the generator. This generator is used to supplement the open set training data for the discriminator. Figure reproduced and modified from [13] with permission. OpenWGAN-GP utilizes this same fundamental training paradigm. In contrast to the implementation of OpenGAN, OpenWGAN-GP was trained with the open-set features rather than the closed-set features sharing the real label. To simplify the figure, metadata is not considered, but in the case of Metaformer closed-set classifier models, the metadata would be included along with the images as an additional input.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>6 Figure 2 :</head><label>62</label><figDesc>Figure 2: Inference pipeline. (a) Averaged closed-set classification probabilities and open-set recognition probabilities are both generated for each observation. Multi-instance averaging, horizontal flipping, and ensemble averaging. To generate predictions for each observation, all instances belonging to that observation are considered. The images for each instance are horizontally flipped and both flips of every instance are used as input to each closed-set classifier in the ensemble. In the example shown, the observation has 3 instances, thus 6 images would be used as input into each model. The number of instances in the dataset is variable. Each closed-set classifier has a corresponding open-set discriminator model that has been trained on its penultimate layer feature representations of open and closed set images as illustrated in Figure 1. To simplify the figure, metadata was omitted, but for Metaformer models it is included as an additional input along with each image. The simplest case of a two model ensemble is illustrated, with each closed-set classifier and open-set discriminator pair shown in a different color. The ensemble used for the final leaderboard evaluation was composed of more than two models. (b) The final prediction is determined based on both the prediction from the closed-set classifier ensemble average probabilities as well as the prediction from the open-set discriminator ensemble average probabilities. If the predicted fungus species is poisonous, then the open-set recognition classification is ignored. If the predicted fungus species is edible, then the open-set recognition classification is used to determine whether the species is unknown or belongs to the closed-set classifier.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>During OpenWGAN-GP training of the open-set discriminator, the same label was used for real features and open-set features. All results shown utilizing OpenWGAN-GP in Section 4 represent cases when the open-set label and real label are shared during training of the open-set discriminator. By assigning the same label to real features and open-set features during the supervised and adversarial phases of each update respectively, the generator is incentivized to generate features that are indistinguishable from features created by the closed-set classifier for open-set observations. This is the opposite of the mapping used in the implementation of OpenGAN, which used the same closed-set label as the real label, which would have the effect of generating supplemental closed-set features instead. Initial experiments showed that a higher macro-F1 score was achieved between open-and closed-set validation examples for the FungiCLEF 2024 dataset when the real label was shared with the open-set label rather than the closed-set label. If the open-set label is the same as the real label, as in this work, the generator generates fake open-set features and the discriminator predicts less realistic features as closed-set features.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Thresholding experiments. All experiments are performed with an ensemble of three CAFormer-S18 (A, B, and C data splits) using optimized training for OpenWGAN-GP, horizontal flips, and multi-instance averaging. OpenWGAN-GP has comparable or better performance across Track 1, Track 2, and Track 3 metrics without the need to tune a threshold per test dataset.</figDesc><table><row><cell cols="6">Open-set recognition Ignore Poison Pred Temperature scaling Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell></row><row><cell>Softmax T=0.25</cell><cell></cell><cell></cell><cell>0.3654</cell><cell>0.1751</cell><cell>0.5405</cell><cell>52.04</cell></row><row><cell>Softmax T=0.25</cell><cell>-</cell><cell></cell><cell>0.3646</cell><cell>0.1847</cell><cell>0.5494</cell><cell>52.08</cell></row><row><cell>Softmax T=0.2</cell><cell>-</cell><cell></cell><cell>0.3676</cell><cell>0.1798</cell><cell>0.5473</cell><cell>51.51</cell></row><row><cell>Entropy T=2</cell><cell></cell><cell></cell><cell>0.3752</cell><cell>0.1808</cell><cell>0.5559</cell><cell>52.52</cell></row><row><cell>Entropy T=2</cell><cell>-</cell><cell></cell><cell>0.3738</cell><cell>0.3163</cell><cell>0.6901</cell><cell>52.61</cell></row><row><cell>Entropy T=3</cell><cell></cell><cell></cell><cell>0.3705</cell><cell>0.1751</cell><cell>0.5456</cell><cell>51</cell></row><row><cell>Entropy T=3</cell><cell>-</cell><cell></cell><cell>0.3699</cell><cell>0.2296</cell><cell>0.5994</cell><cell>51.48</cell></row><row><cell>Entropy T=4</cell><cell>-</cell><cell></cell><cell>0.3726</cell><cell>0.1746</cell><cell>0.5472</cell><cell>50.49</cell></row><row><cell>Entropy T=5</cell><cell>-</cell><cell></cell><cell>0.3756</cell><cell>0.175</cell><cell>0.5506</cell><cell>50.01</cell></row><row><cell>Entropy T=6</cell><cell>-</cell><cell></cell><cell>0.377</cell><cell>0.1751</cell><cell>0.5521</cell><cell>49.54</cell></row><row><cell>Entropy T=7</cell><cell>-</cell><cell></cell><cell>0.3773</cell><cell>0.1751</cell><cell>0.5524</cell><cell>49.4</cell></row><row><cell>OpenWGAN-GP</cell><cell>-</cell><cell>-</cell><cell>0.2287</cell><cell>0.4438</cell><cell>0.6725</cell><cell>49.06</cell></row><row><cell>OpenWGAN-GP</cell><cell></cell><cell>-</cell><cell>0.2458</cell><cell>0.1756</cell><cell>0.4213</cell><cell>49.22</cell></row><row><cell>None</cell><cell>-</cell><cell>-</cell><cell>0.3789</cell><cell>0.1811</cell><cell>0.56</cell><cell>48.95</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 Ensemble</head><label>2</label><figDesc></figDesc><table><row><cell cols="4">Model(s) (data split) Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell></row><row><cell>Metaformer-0 (D)</cell><cell>0.2988</cell><cell>0.1864</cell><cell>0.4852</cell><cell>45.14</cell></row><row><cell>Metaformer-2 (D)</cell><cell>0.2944</cell><cell>0.2082</cell><cell>0.5026</cell><cell>45.07</cell></row><row><cell>CAFormer-S18 (A)</cell><cell>0.2945</cell><cell>0.2042</cell><cell>0.4987</cell><cell>46.47</cell></row><row><cell>CAFormer-S18 (B)</cell><cell>0.2819</cell><cell>0.1766</cell><cell>0.4585</cell><cell>46.53</cell></row><row><cell>CAFormer-S18 (C)</cell><cell>0.2796</cell><cell>0.1759</cell><cell>0.4555</cell><cell>44.67</cell></row><row><cell>CAFormer-S18 (A), CAFormer-S18 (B), CAFormer-S18 (C)</cell><cell>0.2458</cell><cell>0.1756</cell><cell>0.4213</cell><cell>49.22</cell></row><row><cell>Metaformer-0 (D), CAFormer-S18 (B), CAFormer-S18 (C)</cell><cell>0.2424</cell><cell>0.1857</cell><cell>0.4281</cell><cell>49.71</cell></row><row><cell>Metaformer-0 (D), Metaformer-2 (D), CAFormer-S18 (A)</cell><cell>0.2436</cell><cell>0.1737</cell><cell>0.4174</cell><cell>49.89</cell></row><row><cell cols="2">Metaformer-0 (D), Metaformer-2 (D), CAFormer-S18 (C) 0.2394</cell><cell>0.1681</cell><cell>0.4075</cell><cell>49.81</cell></row></table><note>experiments. CAFormer-S18 shows strong performance relative to Metaformers despite not utilizing metadata information. Multiple splits of the data allow a simple method for ensembling and can improve performance beyond what can be achieved solely through employing different architectures in an ensemble. All experiments are performed using optimized training for OpenWGAN-GP, horizontal flips, and multi-instance averaging. Metaformer-0 and Metaformer-2 are always trained using train-val data split D. The data split each model in the ensemble was trained on is shown in parentheses (e.g. CAFormer-S18 (A) represents a CAFormer-S18 model trained on data split A).</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 7</head><label>7</label><figDesc>Poison loss ablation. Both models use image size 576 with no additional augmentations (no multiinstance, horizontal flip, or multicrop averaging). Both models are using the optimized OpenWGAN-GP and ignoring the OpenWGAN-GP prediction in the case of the top classification prediction belonging to a poisonous species.</figDesc><table><row><cell cols="5">Model (data split) Poison loss Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell></row><row><cell>CAFormer-S18 (A)</cell><cell>-</cell><cell>0.3332</cell><cell>0.4175</cell><cell>0.7507</cell><cell>40.78</cell></row><row><cell>CAFormer-S18 (A)</cell><cell></cell><cell>0.3105</cell><cell>0.3071</cell><cell cols="2">0.6176 41.22</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 8</head><label>8</label><figDesc>LogitNorm ablation. As in Table7, both models use image size 576 with no additional augmentations (no multi-instance, horizontal flip, or multicrop averaging). Both models are using the optimized OpenWGAN-GP and ignoring the OpenWGAN-GP prediction in the case of the top classification prediction belonging to a poisonous species. The baseline model is the same as Table7.</figDesc><table><row><cell cols="5">Model (data split) LogitNorm Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell></row><row><cell>CAFormer-S18 (A)</cell><cell>-</cell><cell>0.3370</cell><cell>0.2405</cell><cell>0.5775</cell><cell>38.47</cell></row><row><cell>CAFormer-S18 (A)</cell><cell></cell><cell>0.3105</cell><cell>0.3071</cell><cell>0.6176</cell><cell>41.22</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 9</head><label>9</label><figDesc>OpenWGAN-GP vs vanilla OpenGAN. All results shown use the same model for classification and thus to generate embeddings for OpenGAN training and inference. All the models are ignoring the OpenGAN /OpenWGAN-GP prediction in the case of the top classification prediction belonging to a poisonous species. In all cases, the closed-set classification model is CAFormer-S18 trained on data split A. The baseline model is the same as Tables7 and 8. AUC to F1 as the discriminator selection metric against the validation set apparently also did not make a difference in light of the training failure. Since the same CAFormer-S18 classification model was used to generate the embeddings used by the OpenGAN variants shown above, it appears that the improved Track 1 (and consequently Track 3) performance is the result of the addition of the WGAN-GP training paradigm to OpenGAN.</figDesc><table><row><cell>OpenWGAN-GP data sampling optimizations</cell><cell>OpenGAN variant</cell><cell>Selection metric</cell><cell cols="3">Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell></row><row><cell></cell><cell>OpenGAN</cell><cell>F1 macro</cell><cell>0.4143</cell><cell>0.2735</cell><cell>0.6878</cell><cell>41.78</cell></row><row><cell>-</cell><cell>OpenGAN</cell><cell>ROC-AUC</cell><cell>0.4143</cell><cell>0.2735</cell><cell>0.6878</cell><cell>41.78</cell></row><row><cell></cell><cell cols="2">OpenWGAN-GP (ours) F1 macro</cell><cell>0.3105</cell><cell>0.3071</cell><cell>0.6176</cell><cell>41.22</cell></row><row><cell>switch from ROC-</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 10</head><label>10</label><figDesc>Public leaderboard performance for teams with selected models. The best of each track is independently reported for each team, which means that results for each team may represent distinct solutions for each metric. Accuracy is used for the table ranking.</figDesc><table><row><cell>Rank</cell><cell>Team Name</cell><cell cols="3">Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell><cell>Accuracy↑</cell></row><row><cell>1</cell><cell cols="2">jack-etheredge (ours) 0.2394</cell><cell>0.1357</cell><cell>0.4075</cell><cell>52.08</cell><cell>76.06</cell></row><row><cell>2</cell><cell>chirmy</cell><cell>0.2641</cell><cell>0.4026</cell><cell>0.6667</cell><cell>46.65</cell><cell>73.59</cell></row><row><cell>3</cell><cell>IES</cell><cell>0.2691</cell><cell>0.0699</cell><cell cols="2">0.3621 56.55</cell><cell>73.09</cell></row><row><cell>4</cell><cell>TingTing1999</cell><cell>0.2734</cell><cell>0.4201</cell><cell>0.6934</cell><cell>44.39</cell><cell>72.66</cell></row><row><cell>5</cell><cell>upupup</cell><cell>0.368</cell><cell>0.1348</cell><cell>0.513</cell><cell>54.04</cell><cell>63.2</cell></row><row><cell>6</cell><cell>DS@GT</cell><cell>0.395</cell><cell>1.6493</cell><cell>2.0443</cell><cell>27.61</cell><cell>60.5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 11</head><label>11</label><figDesc>Private leaderboard performance for teams with selected models. The best of each track is independently reported for each team, which means that results for each team may represent distinct solutions for each metric. Accuracy is used for the table ranking.</figDesc><table><row><cell>Rank</cell><cell>Team Name</cell><cell cols="3">Track1↓ Track2↓ Track3↓</cell><cell>F1↑</cell><cell>Accuracy↑</cell></row><row><cell>1</cell><cell cols="2">jack-etheredge (ours) 0.2436</cell><cell>0.1613</cell><cell>0.4075</cell><cell>56.79</cell><cell>75.64</cell></row><row><cell>2</cell><cell>chirmy</cell><cell>0.2693</cell><cell>0.4149</cell><cell>0.6667</cell><cell>51.75</cell><cell>73.07</cell></row><row><cell>3</cell><cell>TingTing1999</cell><cell>0.2749</cell><cell>0.4378</cell><cell>0.6934</cell><cell>51.42</cell><cell>72.51</cell></row><row><cell>4</cell><cell>IES</cell><cell>0.2958</cell><cell>0.0860</cell><cell>0.3621</cell><cell>56.41</cell><cell>70.42</cell></row><row><cell>5</cell><cell>upupup</cell><cell>0.3882</cell><cell>0.0718</cell><cell>0.513</cell><cell>54.80</cell><cell>61.18</cell></row><row><cell>6</cell><cell>DS@GT</cell><cell>0.3907</cell><cell>1.6040</cell><cell>2.0443</cell><cell>30.01</cell><cell>60.93</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The author would like to thank Jillian Etheredge for constructive criticism of the manuscript.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>OpenWGAN-GP training optimization. "n.o. " refers to the number of open set samples in thousands. "n.c. " refers to the number of closed set samples in thousands. The best performance for Track 3 was achieved by ignoring poison predictions ("i.p.p. ") and balancing the datasets using oversampling for the open-set data and weighted undersampling of the closed-set data. Training augmentations were performed for both datasets. In all cases, the closed set is undersampled. For closed-set sampling ("closed sample"), "w.  Track 3 performance is best for multi-instance averaging in combination with horizontal flip averaging of the combinations shown in Table <ref type="table">5</ref>. Despite improved performance on local validation, increasing the image resolution to 576 relative to the training resolution of 384 appears to have mixed results on the leaderboard performance as shown in Table <ref type="table">6</ref>. For data split A, the Track 1 score is improved with a higher resolution, but Track 2, Track 3, and F1 performance are better with an image resolution of 384. Image resolution 384 seems favored overall.</p><p>Table <ref type="table">7</ref> shows that the removal of poison loss degrades the performance of the model across all metrics. Track 2 has a larger percent change than Track 1 or F1, which is sensible given that Track 2 corresponds to the edible → poisonous confusion loss.</p><p>Removal of LogitNorm from the training decreased performance for Track 1 and F1 as shown in Table <ref type="table">8</ref>. Presumably this is due to improving the separation of the classes in the latent space which is used by OpenWGAN-GP for unknown classification. Future work could explore whether LogitNorm also improves classification of fine-grained datasets in cases for which open-set recognition is not a consideration. Interestingly, removal of LogitNorm increases performance in Track 2 as shown in Table <ref type="table">8</ref>. The gain in performance in Track 2 from the removal of LogitNorm is great enough that Track 3 (the sum of Track 1 and Track 2 losses) is improved. This may suggest that LogitNorm is incompatible with the the poison loss used in this work. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">OpenWGAN-GP</head><p>The training stability and overall performance of OpenWGAN-GP is demonstrated relative to the original OpenGAN in Table <ref type="table">9</ref>. The identical performance of OpenGAN with and without these training optimizations suggests that the particular failure state observed is OpenGAN classifying none of the test set observations as open-set. The data sampling optimizations explored in Table <ref type="table">3</ref> were not sufficient to overcome the failure of OpenGAN to learn a meaningful representation of the data for classification. The</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Mushroom Poisoning Outbreaks -China</title>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.46234/ccdcw2024.014</idno>
		<ptr target="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832152/.doi:10.46234/ccdcw2024.014" />
	</analytic>
	<monogr>
		<title level="j">China CDC Weekly</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="64" to="68" />
			<date type="published" when="2023">2023. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</author>
		<title level="m">Overview of FungiCLEF 2024: Revisiting fungi species recognition beyond 0-1 cost</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of LifeCLEF 2024: Challenges on species distribution prediction and identification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Goëau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Espitalier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Botella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Marcos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Estopinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Leblanc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Larcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hrúz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">MetaFormer: A Unified Meta Framework for Fine-Grained Recognition</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Diao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yuan</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2203.02751</idno>
		<idno type="arXiv">arXiv:2203.02751</idno>
		<ptr target="http://arxiv.org/abs/2203.02751.doi:10.48550/arXiv.2203.02751" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Van Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">Mac</forename><surname>Aodha</surname></persName>
		</author>
		<ptr target="https://kaggle.com/competitions/inaturalist-2021" />
		<title level="m">iNat Challenge 2021 -FGVC8</title>
				<imprint>
			<publisher>Kaggle</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Building a bird recognition app and large scale dataset with citizen scientists: The fine print in finegrained dataset collection</title>
		<author>
			<persName><forename type="first">G</forename><surname>Van Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Branson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Farrell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Haber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Barry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ipeirotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Belongie</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2015.7298658</idno>
		<ptr target="http://ieeexplore.ieee.org/document/7298658/.doi:10.1109/CVPR.2015.7298658" />
	</analytic>
	<monogr>
		<title level="m">2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR</title>
				<meeting><address><addrLine>Place; Boston, MA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="volume">9781467369640</biblScope>
			<biblScope unit="page" from="595" to="604" />
		</imprint>
	</monogr>
	<note>IEEE Conference on Computer Vision and Pattern Recognition (CVPR</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Wah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Branson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Welinder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Belongie</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:16119123" />
		<title level="m">The Caltech-UCSD Birds-200-2011 Dataset</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chamidullin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</author>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Overview of FungiCLEF 2023: Fungi Recognition Beyond 1/0 Cost</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Learning Multiple Layers of Features from Tiny Images</title>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:18268744" />
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Tiny ImageNet Visual Recognition Challenge</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">S</forename><surname>Yang</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:16664790" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">MNIST handwritten digit database</title>
		<author>
			<persName><forename type="first">Yann</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Corinna</forename><surname>Cortes</surname></persName>
		</author>
		<author>
			<persName><surname>Burges</surname></persName>
		</author>
		<author>
			<persName><surname>Cj</surname></persName>
		</author>
		<ptr target="http://yann.lecun.com/exdb/mnist" />
	</analytic>
	<monogr>
		<title level="j">ATT Labs</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Reading Digits in Natural Images with Unsupervised Feature Learning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Netzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Coates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bissacco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ng</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:16852518" />
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ramanan</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2104.02939</idno>
		<idno type="arXiv">arXiv:2104.02939</idno>
		<ptr target="http://arxiv.org/abs/2104.02939.doi:10.48550/arXiv.2104.02939" />
		<title level="m">OpenGAN: Open-Set Recognition via Open Data Generation</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hendrycks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:13046179" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Scaling Out-of-Distribution Detection for Real-World Settings</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hendrycks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Basart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mazeika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mostajabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Steinhardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:227407829" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Open-Set Recognition: A Good Closed-Set Classifier is All You Need?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Vaze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno>ArXiv abs/2110.06207</idno>
		<ptr target="https://api.semanticscholar.org/CorpusID:238634102" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Entropy-guided open-set fine-grained fungi recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:264441405" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Toward Open Set Recognition</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">J</forename><surname>Scheirer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>De Rezende Rocha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sapkota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Boult</surname></persName>
		</author>
		<idno type="DOI">10.1109/TPAMI.2012.256</idno>
		<ptr target="http://ieeexplore.ieee.org/document/6365193/.doi:10.1109/TPAMI.2012.256" />
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="1757" to="1772" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Danish Fungi 2020 -Not Just Another Image Recognition Dataset</title>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heilmann-Clausen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">S</forename><surname>Jeppesen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Laessøe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Frøslev</surname></persName>
		</author>
		<idno type="DOI">10.1109/WACV51458.2022.00334</idno>
		<idno type="arXiv">arXiv:2103.10107</idno>
		<ptr target="http://arxiv.org/abs/2103.10107.doi:10.1109/WACV51458.2022.00334" />
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="3281" to="3291" />
		</imprint>
	</monogr>
	<note>cs, eess</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Seesaw Loss for Long-Tailed Instance Segmentation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Loy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2008.10032</idno>
		<idno type="arXiv">arXiv:2008.10032</idno>
		<ptr target="http://arxiv.org/abs/2008.10032.doi:10.48550/arXiv.2008.10032" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">MetaFormer is Actually What You Need for Vision</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Si</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yan</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR52688.2022.01055</idno>
		<ptr target="https://ieeexplore.ieee.org/document/9879612/.doi:10.1109/CVPR52688.2022.01055" />
	</analytic>
	<monogr>
		<title level="m">2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR</title>
				<meeting><address><addrLine>Place; New Orleans, LA, USA Publisher</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="10809" to="10819" />
		</imprint>
	</monogr>
	<note>IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">PyTorch: An Imperative Style, High-Performance Deep Learning Library</title>
		<author>
			<persName><forename type="first">A</forename><surname>Paszke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Massa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lerer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bradbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Killeen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Gimelshein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Antiga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Desmaison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kopf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Devito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Raison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tejani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chilamkurthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<ptr target="https://papers.nips.cc/paper_files/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">32</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kondratyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Christiansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Kitani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Alon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Eban</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2012.01988</idno>
		<idno type="arXiv">arXiv:2012.01988</idno>
		<ptr target="http://arxiv.org/abs/2012.01988.doi:10.48550/arXiv.2012.01988" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">MetaFormer Baselines for Vision</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Si</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1109/TPAMI.2023.3329173</idno>
		<idno type="arXiv">arXiv:2210.13452</idno>
		<ptr target="http://arxiv.org/abs/2210.13452.doi:10.1109/TPAMI.2023.3329173" />
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="page" from="896" to="912" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">ImageNet: A large-scale hierarchical image database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2009.5206848</idno>
		<ptr target="https://ieeexplore.ieee.org/document/5206848.doi:10.1109/CVPR.2009.5206848" />
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<date type="published" when="1063">2009. 2009. 1063-6919</date>
			<biblScope unit="page" from="248" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Improved Training of Wasserstein GANs</title>
		<author>
			<persName><forename type="first">I</forename><surname>Gulrajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Arjovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dumoulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1704.00028</idno>
		<idno type="arXiv">arXiv:1704.00028</idno>
		<ptr target="http://arxiv.org/abs/1704.00028.doi:10.48550/arXiv.1704.00028" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1502.03167</idno>
		<idno type="arXiv">arXiv:1502.03167</idno>
		<ptr target="http://arxiv.org/abs/1502.03167.doi:10.48550/arXiv.1502.03167" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Empirical Evaluation of Rectified Activations in Convolutional Network</title>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1505.00853</idno>
		<idno type="arXiv">arXiv:1505.00853</idno>
		<ptr target="http://arxiv.org/abs/1505.00853.doi:10.48550/arXiv.1505.00853" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Generative Adversarial Networks</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pouget-Abadie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mirza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ozair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1406.2661</idno>
		<idno type="arXiv">arXiv:1406.2661</idno>
		<ptr target="http://arxiv.org/abs/1406.2661.doi:10.48550/arXiv.1406.2661" />
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno>CoRR abs/1412.6980</idno>
		<ptr target="https://api.semanticscholar.org/CorpusID:6628106" />
		<title level="m">Adam: A Method for Stochastic Optimization</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">Decoupled Weight Decay Regularization</title>
		<author>
			<persName><forename type="first">I</forename><surname>Loshchilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1711.05101</idno>
		<idno type="arXiv">arXiv:1711.05101</idno>
		<ptr target="http://arxiv.org/abs/1711.05101.doi:10.48550/arXiv.1711.05101" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>cs, math</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">Mitigating Neural Network Overconfidence with Logit Normalization</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2205.09310</idno>
		<idno type="arXiv">arXiv:2205.09310</idno>
		<ptr target="http://arxiv.org/abs/2205.09310.doi:10.48550/arXiv.2205.09310" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">On Calibration of Modern Neural Networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pleiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</author>
		<idno>arXiv:</idno>
		<ptr target="1706.04599" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">G</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2103.10158</idno>
		<idno type="arXiv">arXiv:2103.10158</idno>
		<ptr target="http://arxiv.org/abs/2103.10158.doi:10.48550/arXiv.2103.10158" />
		<title level="m">TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jia</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2001.04086</idno>
		<idno type="arXiv">arXiv:2001.04086</idno>
		<ptr target="http://arxiv.org/abs/2001.04086.doi:10.48550/arXiv.2001.04086" />
		<title level="m">GridMask Data Augmentation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title level="m" type="main">Fixing the train-test resolution discrepancy</title>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jégou</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1906.06423</idno>
		<idno type="arXiv">arXiv:1906.06423</idno>
		<ptr target="http://arxiv.org/abs/1906.06423.doi:10.48550/arXiv.1906.06423" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<title level="m" type="main">EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:167217261" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">EfficientNetV2: Smaller Models and Faster Training</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:232478903" />
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
