<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>OpenWGAN-GP for Fine-Grained Open-Set Fungi Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jack N. Etheredge</string-name>
          <email>jack.etheredge@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Twosense</institution>
          ,
          <addr-line>New York, New York</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Understanding and accurately classifying fungi is crucial for ecological studies, food safety, and public health. In this paper, I present my approach to the FungiCLEF 2024 challenge, which aims to classify images of fungi, identify open-set “unknown” fungi species in the test data, and reduce the confusion between edible and poisonous fungi. This method leverages a combination of Metaformer-0, Metaformer-2, and CAFormer-S18 models, chosen for their strong classification performance relative to their computational eficiency. The models utilize metadata, while CAFormer-S18 does not, yet all belong to the same family of models known as Metaformers and employ convolutional blocks followed by multi-headed self-attention transformer blocks. My primary novel contribution is the application of OpenGAN to detect unknown fungi species, enhanced by incorporating WGAN-GP to improve training stability, resulting in a new open-set classifier training paradigm I term OpenWGAN-GP. This approach enables a lightweight discriminator to utilize the latent representations from the closed-set classifier for binary classification of open-set vs. closed-set species. My best-performing ensemble achieved public leaderboard scores of 0.2394 for Track 1, 0.1681 for Track 2, and 0.4075 for Track 3, along with a macro-averaged F1 score of 49.81%. Track 1 represents the classification loss with unknowns, Track 2 represents the edible-poisonous confusion loss (weighted heavily for poisonous to edible misclassifications), and the sum of Track 1 and Track 2. My method secured 1st place in the FungiCLEF 2024 competition for Track 1, F1, and Accuracy on the private leaderboard. Code is available at https://github.com/Jack-Etheredge/fungiclef2024.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Correctly identifying mushroom species and distinguishing between poisonous and edible varieties
are critical for public health. In 2023 alone China had 1,303 reported cases of mushroom poisonings
traced to 97 species of mushroom, of which 12 were newly discovered species [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. FungiCLEF 2024 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
is a competition held as part of the LifeCLEF 2024 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] lab 1. FungiCLEF is a long-tailed fine-grained
open-set classification task with an additional asymmetrically weighted edible-poisonous confusion
component. In this work, I propose a novel solution for fine-grained classification of fungi species
that simultaneously minimizes confusion between edible and poisonous species as well as detecting
species of fungi unknown to the training dataset. The primary contributions of this work are 1) the
use of an open-set recognition classifier trained using the embeddings from the closed-set classifier
applied to fine-grained open-set recognition as well as 2) leveraging an ensemble of computationally
lean models with carefully selected test-time augmentations. Extensive experimentation was used to
improve the training methodology of this discriminator. I refer to this optimized open-set classifier
training paradigm as OpenWGAN-GP.
https://github.com/Jack-Etheredge (J. N. Etheredge)
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Fine-grained classification</title>
        <p>
          Fine-grained classification involves classifying data that belongs to the same greater category.
Finegrained classification is challenging due to the large intra-set variation and low inter-set variation, in
contrast to standard coarse grained recognition. Fine-grained data, like open-set data, are ubiquitous in
the real world, and addressing the unique challenges posed by both has become the subject of more
intense academic interest recently. Metaformer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] has been shown to perform well across a diverse set
of fine-grained datasets such as iNaturalist [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], NABirds [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and CUB-200-2011 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] by incorporating
metadata information into a hybrid convolutional vision transformer architecture. Many methods have
been employed recently to classify fungi for the FungiCLEF dataset [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], including the aforementioned
Metaformer.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Open-set recognition</title>
        <p>
          Most works on open-set recognition primarily utilize coarse-grained classification datasets such as
CIFAR-10 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and Tiny-ImageNet [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], alongside the fine-grained digit recognition datasets MNIST
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and SVHN [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], as outlined in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Some studies even involve cross-dataset comparisons, which
generally fall into the broader category of out-of-distribution detection. However, open-set recognition
within fine-grained classification is more challenging because the open-set data shares low inter-class
variation with the closed-set data, as both belong to the same coarse-grained category or super-category
(e.g., all fungi). For instance, while an image of a dog is clearly out-of-distribution for a fungi dataset,
an unseen species of fungus is more semantically similar to known fungi species, making it harder to
identify as unknown. This highlights the unique challenge of fine-grained open-set recognition.
        </p>
        <p>
          FungiCLEF presents an opportunity to address this challenge by applying open-set recognition
techniques to fine-grained data, a task where many methods have shown promise [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Various techniques
for open-set recognition and out-of-distribution detection exist, ranging from simple threshold-based
methods such as maximum softmax probability [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], maximum logit [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ], or softmax entropy [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
to K+1 way classifier which treats all unknowns as an additional class [
          <xref ref-type="bibr" rid="ref13 ref18">18, 13</xref>
          ]. More advanced methods
involve specialized models for handling open-set data.
        </p>
        <p>
          In this work, I utilized OpenGAN [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which belongs to the category of specialized models. OpenGAN
trains a binary classifier (the discriminator) using labeled data from both the closed-set and open-set,
with a generator creating synthetic examples of open-set data during discriminator training to aid the
discriminator in generalizing beyond the labeled open-set data. I enhanced this approach by improving
the training stability of the open-set classifier, applying it to the fine-grained FungiCLEF 2024 dataset
to efectively detect unknown fungi species.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>
          The FungiCLEF 2024 challenge [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] dataset originates from the Danish Fungi 2020 dataset [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], which
comprises 295,938 training images representing 1,604 species primarily observed within Denmark. Each
training sample underwent rigorous expert validation to ensure precise labeling. In addition to images,
comprehensive observation metadata including habitat, substrate, time, location are provided. The
validation set consists of 30,131 observations encompassing 60,832 images across 2,713 species, spanning
the entirety of the year and capturing observations from diverse substrate and habitat categories.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Competition objective</title>
        <p>
          FungiCLEF represents two distinct challenges. One challenge is fine-grained and long-tailed
classification. Seesaw loss [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] was used to counteract the efect of the long-tailed class distribution. Seesaw loss
is a modification to standard cross-entropy loss. Given predicted logits  and predicted probabilities 
from the classifier, and   is the one-hot encoded ground truth label with 1 &lt;=  &lt;=  , seesaw loss is
defined as
over classes  where   is defined as

=1

 
 seesaw() = − ∑   log(  ),
  =
        </p>
        <p>∑
≠</p>
        <p>
          +   
.

 is a balancing coeficient between diferent classes.
mitigation factor   and a compensation factor   :
  =   ⋅  
  is determined by a combination of a
  mitigates the penalty on tail classes based on their instance ratio compared to head classes by
decreasing the penalty on class  relative to the ratio of instance counts between the less abundant tail
class  and the more abundant head class  . Conversely,   increases the penalty on class  whenever
a misclassification occurs from class  to class  . This dual-factor approach in   allows Seesaw loss
to dynamically adjust penalties based on both instance distribution and misclassification behavior,
optimizing the learning process in long-tailed multi-class classification tasks. The loss function is
explained in greater detail in the original paper [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>
          The closed-set image classification models used to classify the images belong to a family of hybrid
convolutional and self-attention transformer models known as Metaformers [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. These models are
explained in detail in Section 3.5.
        </p>
        <p>The other challenge is the recognition of the open-set “unknown” class. OpenWGAN-GP was used to
classify images as belonging to the closed-set or open-set datasets. The architecture of the open-set
discriminator and the OpenWGAN-GP training methodology are described in greater detail in Section
3.6.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation metrics</title>
        <p>The public leaderboard for the competition reported multiple metrics for each submission. Track 1 was
a classification loss that included unknowns.</p>
        <sec id="sec-3-3-1">
          <title>Track 2 was an edible-poisonous confusion loss with a</title>
          <p>×100 weight for poisonous → edible misclassification. Track 3 was the sum of the Track 1 and Track 2
losses. Additionally, the macro-averaged F1 score and accuracy were reported. The accuracy has been
ignored for all experimental results reported here. Apart from the macro-averaged F1 score, none of
the metrics correct for class imbalance. Thus the impact of classification performance on each class
impacts final performance for Tracks 1-3 proportional to the number of observations for that class.</p>
          <p>Track 1 loss is a standard classification error with an additional “unknown” class:
(1)
(2)
(3)
(4)
(5)
 1 = ∑  1(  , (  )),</p>
          <p>1( , ()) =
{
0 if () =</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>1 otherwise</title>
          <p>.
for class predictions () for observations  from a classifier  and true labels  . The cost function  1 is
defined as</p>
          <p>Track 2 loss penalizes the confusion of edible and poisonous species. Consider a function  that
indicates poisonous species as ( ) = 1</p>
          <p>if species  is poisonous, and ( ) = 0
denote the cost for poisonous → edible misclassification (a poisonous observation was predicted as
otherwise. Let  
edible) and  
is defined as:
the cost for edible → poisonous misclassfication. 

= 1 and  
for class predictions () for observations  from a classifier  and true labels  as in  1. The cost
function  2 is defined as
 2( , ()) =
0
 
⎧
⎨</p>
          <p>⎩ 
if ( ) = (())
if ( ) = 1 and (()) = 0
otherwise</p>
          <p>Track 3 (the “user-focused loss”) is simply the sum of Track 1 and Track 2 losses:</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Custom poison loss</title>
        <p>weighted binary cross entropy loss.</p>
        <p>A custom poison loss was used for all of the final models. The poison loss was formulated as a class</p>
        <p>Given the set of poisonous classes  and the set of edible classes  , sum the probabilities of each class
independently where   is the softmax probability output for class  .</p>
        <p>3 =  1 +  2.
 
 
= ∑   .</p>
        <p>∈
= ∑   .</p>
        <p>∈
(6)
(7)
(8)
(9)
(10)
(11)</p>
        <p>Let  ∈ {0, 1} be the binary ground truth label for an image, where 1 indicates a poisonous class and
0 indicates an edible class.  = 100 is the weight assigned to the edible class to penalize edible →
poisonous misclassifications. Thus, the weighted cross-entropy loss is as follows:
 ,ℎ
= −[ log( 
) + (1 −  ) log( 
)]</p>
        <p>Since the softmax probabilities output by the model sum to 1, the probabilities for all the poisonous
classes and all the edible classes were summed independently and used as the prediction for the binary
cross entropy criterion. A weight of 100 was assigned to the edible class, since edible → poisonous
misclassifications (true label is edible, predicted label is poisonous) was penalized ×100 in the Track 2 loss.
The total training loss was the sum of the seesaw loss and the custom poison loss. This approach ensures
that the training process emphasizes correctly classifying edible species as edible, thus reducing the
risk of mistakenly classifying edible species as poisonous, which is heavily penalized in the evaluation
metrics.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Model architectures</title>
        <p>All experiments were performed on a machine with a single NVIDIA RTX 3090 graphics card and
all models were trained using PyTorch [22]. Given that I was working on a single computer for this
competition, eficient use of the limited compute available was of critical importance. Ensembling
and test-time augmentations were used to increase performance while keeping training eficiency in
check. An ensemble of computationally lean models have been shown to outperform a single larger
model with respect to both training and inference cost [23]. Model architectures were chosen based
on their performance on ImageNet and/or iNaturalist relative to the computational complexity of the
models in TFLOPs, aiming for a final ensemble of at least two models. Test time augmentations allow
for much greater performance without requiring any additional training of the models, making them a
particularly attractive target for optimization when compute and time are both limited.</p>
        <p>
          Metaformers [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] are a family of models that combine diferent tokenizers with a transformer backbone.
A collection of Metaformer models (Metaformer-0, Metaformer-1, and Metaformer-2) were created in
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] that combine metadata with the images to improve the classification performance of the models
on multiple fine-grained image datasets. CAFormer [24] models are very similar in architecture to the
Metaformer variants proposed in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], but do not make use of metadata. In the final ensemble,
Metaformer0, Metaformer-2, and CAFormer-S18 were used. Hereafter Metaformer refers to the Metaformer models
from [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] which incorporate metadata information.
        </p>
        <p>
          I fine-tuned CAFormer-S18 with weights pretrained on ImageNet-21K [25] while Metaformer-0 and
Metaformer-2 models were pretrained on iNaturalist2021 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. CAFormer-S18 was fine-tuned on a diferent
train-validation split than the two Metaformer models. Metaformer-0 and Metaformer-2 difer only in
the number of channels in the convolutional and transformer blocks, with Metaformer-2 having more
channels in every block. The S18 variant of CAFormer refers to a specific combination of convolutional
and self-attention token mixers. CAFormer-S18 utilizes a total of 18 blocks: 3 convolution blocks with 64
channels, 3 convolution blocks with 128 channels, 9 attention blocks with 320 channels, and 3 attention
blocks with 512 channels.
3.6. OpenGAN and OpenWGAN-GP
To my knowledge, this is the first time that OpenGAN has been utilized for open set recognition of
ifne-grained images beyond digit recognition. In order to improve training stability, I incorporated
the Wasserstein GAN loss and gradient penalty (WGAN-GP ) [26] into the training of OpenGAN [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
to create OpenWGAN-GP. In addition to incorporating WGAN-GP, batch normalization layers were
replaced with layer normalization layers for the discriminator as suggested in [26].
        </p>
        <p>OpenGAN proposes selection of the discriminator against a validation set of open- and closed-set
examples, selecting the discriminator with the best validation ROC-AUC. However, since the ROC-AUC
is calculated using a range of classification thresholds, the best classification threshold would also
need to be determined for each OpenGAN discriminator. As such, the macro-averaged F1 was used
as the selection metric instead of ROC-AUC for OpenWGAN-GP. Additionally, if the OpenWGAN-GP
discriminator is selected based on macro-F1, an ensemble can be averaged without the need to calibrate
the classification threshold.</p>
        <p>Another diference from the original OpenGAN paper is that models were selected based on their
macro-F1 performance rather than the proposed ROC-AUC. Since models would be used in an ensemble
and the OpenWGAN-GP probabilities would be averaged, it was important to assume that the
classification threshold for all of the individual OpenWGAN-GP models would be the same. An alternative
strategy would have been voting, which would have allowed for diferent classification thresholds per
model, but this was not explored in this study.</p>
        <p>OpenGAN is a methodology for training a lightweight discriminator that utilizes the intermediate
representation of an image to generate a binary classification for open-set recognition. Several related
methods were proposed, but the one that performed best in their experiments and the one that I focus
on in this work is OpenGAN fea with the inclusion of open-set training data, which I will simply refer
to as OpenGAN. This paradigm allows for training a classification without initial consideration of
the open set data. The discriminator is a multilayer perceptron consisting of fully connected layers
with sizes  →  × 8 →  × 4 →  × 2 →  → 1 , where  represents the dimension of the
intermediate representation from the closed-set classifier and  is a hidden dimension multiplier.
 = 64 unless otherwise specified. The output layer uses a sigmoid activation function. Batch
normalization [27] and LeakyRELU [28] are used between each dense layer. During training, the
generator generates a feature vector of length  from a 100-dimensional input vector with each value
sampled independently from a standard normal distribution (mean 0, variance 1). The generator is also a
multilayer perceptron with batch normalization and LeakyRELU. It has a similar architecture, but there
are some critical diferences. The output dimension of the generator must match the input dimension 
closed-set images
open-set images
closed-set classifier
generator
intermediate</p>
        <p>features
fake open-set
features</p>
        <p>A) supervised
closed (0) vs open (1)</p>
        <p>discriminator
real (1) vs fake (0)</p>
        <p>B) adversarial
of the discriminator, both of which correspond to the dimension of the intermediate representation
of the closed-set classifier. As such, the generator is a multilayer perceptron with fully connected
layers of sizes  →  × 8 →  × 4 →  × 2 →  × 4 →  , where  = 100 . The input dimension 
is arbitrary, but this work utilizes a 100-dimensional input vector. As with the discriminator,  = 64
unless otherwise specified. Additionally, the output activation used by the generator is Tanh instead of
sigmoid activation. During training, the discriminator is trained to classify the closed and open set data
in a supervised manner with binary labels using binary cross entropy. The generator is used to generate
additional open set data and both the generator and discriminator are trained using the standard GAN
training paradigm [29]. As such, the discriminator is updated twice per update of the generator since
the discriminator is trained adversarially against the generator (real vs fake) as well as supervised
(openvs closed-set). This training paradigm is illustrated in Figure 1. The Adam optimizer [30] was used with
a learning rate of 1e-4 for the discriminator and 2e-4 for the generator. The higher learning rate is used
for the generator to account for the 2:1 updates of the discriminator vs the generator.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.7. Metadata</title>
        <p>
          Metaformer-0 and Metaformer-2 allow the fusion of metadata information with the vision
information. Metadata was utilized to provide the model with information concerning location, local growth
conditions, and temporal information by including the country code, substrate, and habitat, and
observation date (month and day). Example substrates include “fruits”, “wood”, “cones”, “soil”, and “peat
mosses”, while example habitats include “bog”, “dune”, “meadow”, and “roof”. There are 34, 32, and
31 categories for country code, substrate, and habitat, respectively. Metadata was preprocessed for
Metaformer-0 and Metaformer-2 according to [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The month and day were transformed by periodic
encoding into [sin ( 2 month ) , cos ( 2 month )] and [sin ( 2 day ) , cos ( 2 day )] respectively to preserve
12 12 31 31
temporal relationships. Geographical information in the form of country codes, habitat, and substrate
were all one-hot encoded. Metaformer-0 and Metaformer-2 utilize trainable embeddings to project this
encoded metadata to the same dimensionality as the image features in order to fuse them with the
latent representation of the images.
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>3.8. Training settings</title>
        <p>I used the AdamW optimizer [31] with an initial learning rate of 1e-3 on only the classification output
dense layer with the pretrained model frozen for the first 5 epochs, then reduced to 5e-5 for subsequent
epochs. CAFormer-S18 models were trained with a batch size of 40, while Metaformer-0 models were
trained with a batch size of 32, and Metaformer-2 models were trained with a batch size of 12. A weight
decay of 0.05 was used for all of these models. The learning rate was reduced by a factor of 0.1 if the
model did not improve the validation loss for 5 consecutive epochs. Early stopping was also employed
when training all models to prevent wasting compute time on models which were no longer improving
in their generalization to the validation set as measured by the validation loss.</p>
        <p>LogitNorm [32] describes a technique that applies an L2 norm to the logits during training (the norm
is not applied during inference). It was included with the hope that it should improve the separation of
the classes in the embedding space relative to standard seesaw loss, which might enable OpenWGAN-GP
to leverage the embedding space for more accurate open-set recognition. Additionally, LogitNorm was
shown to act similarly to temperature scaling [33] to create models that generate less overconfident
predictions. This would be important for maximum softmax probability or entropy thresholding, which
were explored as alternatives to OpenWGAN-GP.</p>
      </sec>
      <sec id="sec-3-8">
        <title>3.9. Training data augmentation</title>
        <p>Training was performed with a square random crop, TrivialAugment [34], horizontal flip with 50%
probability, and GridMask [35] with a probability of 20%, applied in that order.</p>
      </sec>
      <sec id="sec-3-9">
        <title>3.10. Test-time augmentations</title>
        <p>At test-time, all images were resized with bicubic interpolation along the shortest dimension to 384 or
576 depending on the model followed by a square center crop of the same size. Test-time augmentations
and ensembling were instrumental techniques for the final inference performance. Using a larger image
size for inference than training was shown to improve accuracy for multiple datasets in FixRes [36].</p>
        <p>The strategies employed were averaging horizontal flips, multi-instance averaging, ensemble
averaging, and inference at a higher resolution (576x576) for CAFormer-S18 relative to training (384x384). The
overall inference pipeline is summarized in Figure 2. Fivecrop was also investigated, but could not be
incorporated in the allowed compute budget. Since it did not yield as significant an improvement relative
to a larger ensemble with horizontal flipping according to local evaluation and public leaderboard scores
for Track 3, the chosen configuration was preferred. It could be useful in the future to experiment with
ensembling techniques that are more sophisticated than simple averaging, but none were attempted in
this study.</p>
        <p>Due to the open set “unknown” class being implicitly edible, the penalty for misclassifying poisonous
mushrooms as unknown was greater than the decrease in misclassification loss. To mitigate this in the
proposed solution, if the top prediction of the classification network was a poisonous mushroom, the
prediction from the OpenGAN open set classifier was ignored for that observation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental results</title>
      <p>My best-performing ensemble and OpenWGAN-GP combinations achieved 1st place on Track 1, F1, and
Accuracy on the private leaderboard. All results reported are for the public leaderboard test set unless
explicitly stated otherwise.</p>
      <sec id="sec-4-1">
        <title>4.1. Open-set recognition</title>
        <p>Diferent open set detection methods were evaluated against the public leaderboard and these results
can be seen in Table 1. Experiments with a local validation set showed that temperature scaling
improved the performance of maximum softmax probability (MSP) thresholding as well as softmax
closed-set
classifier
penultimate</p>
        <p>layer
features</p>
        <p>open-set
discriminator
horizontal flipping
closed-set multiclass softmax probabilities
open-vs-closed probabilities
g
n
i
g
a
r
e
v
a
e
c
n
a
t
s
n
i
i
t
l
u
m</p>
        <p>x6
averaged closed-set probabilities
x6
averaged open-vs-closed probabilities
x6
final averaged
closed-set
probabilities
x6
final averaged
open-vs-closed
probabilities
x6
x6
ensemble averaging
a) Inference pipeline
b) Classification decision
entropy thresholding (experiments not shown). OpenWGAN-GP consistently performed better on
the leaderboard than softmax thresholding or entropy thresholding, even after temperature scaling
[33] the probabilities. As can be seen in Table 1, the performance of either of these methods depends
on optimizing a threshold. The optimal entropy threshold for local validation was 6, which did not
appear to be optimal for the public leaderboard. This suggests that this method may not generalize
well between test sets. OpenWGAN-GP is a binary classifier that is selected using the macro- F1 score
with a classification threshold of 0.5, which means that no additional thresholding should be needed
to generalize between test sets. Table 1 shows that despite not tuning the classification threshold,
OpenWGAN-GP shows the best Track 1 and Track 3 performance while maintaining a similar Track 2
performance to MSP and entropy thresholding after avoiding poisonous → unknown misclassification
as explained below.</p>
        <p>It can be seen from the results in Table 1 that ignoring OpenWGAN-GP predictions for open-set
recognition in the cases when the highest predicted probability belongs to a poisonous species (“ignore
poison pred”) is critical to preventing the open-set recognition from degrading performance on Track 3.
OpenWGAN-GP without ignore poison pred achieves a better Track 1 score than OpenWGAN-GP with
ignore poison pred, but a much higher Track 2 score. This suggests that in many cases OpenWGAN-GP
is correctly identifying unknowns that the classification network is predicting to be poisonous, but
that the poisonous → edible cost for the poisonous closed → open misclassifications overwhelms the
improvement in classification loss. This reinforces how challenging it is to simultaneously optimize
classification performance, identification of unknown species, and avoidance of misclassifying poisonous
species as edible.</p>
        <p>
          Following [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], I explored fine-tuning the models through outlier exposure after first training the
models without the inclusion of unknowns, but validation loss failed to improve after the first epoch
upon inclusion of unknowns (results not shown). It appears that unknowns were included for the entire
duration of training in their work. Unfortunately, I was unable to complete this experiment before the
competition concluded.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Model architectures and ensemble selection</title>
        <p>Multiple architectures were evaluated for this study. Eficientnet-B0 [37] and Eficientnetv2-S [38] were
both experimented with but their results were not as promising as Metaformer and CAFormer against
a local validation set in early experiments. Results are not shown for these experiments since they
are not directly comparable to the experiments reported. CAFormer-S18 has better performance than
Metaformer-0 when the image resolution is increased at inference time despite belonging to the same
family of models as Metaformer-0 and Metaformer-2, which leverage metadata information. CAFormers
performed almost as well as Metaformer despite not utilizing information from the metadata. Future
work could evaluate merging the two architectures into a CAFormer with a head for the metadata
information. An ensemble of three CAFormer-S18 models that vary only by their training and validation
data split performs nearly as well as ensembles of Metaformer-0, Metaformer-2, and CAFormer-S18. The
best performing ensemble was Metaformer-0, Metaformer-2, CAFormer-S18 split C. Split C performs
better than the other two CAFormer-S18 data splits, which provides a likely explanation as to why this
ensemble outperformed Metaformer-0, Metaformer-2, and CAFormer-S18 split A.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Optimization of OpenWGAN-GP training</title>
        <p>OpenWGAN-GP training was optimized with respect to the hidden dimension size, ratio of closed and
open set samples, and whether training augmentations were applied. Table 3 shows that the best Track
3 performance is achieved when using training augmentations and oversampling the open-set data
to roughly the same number of samples as the closed-set data. This case is represented as weighted
undersampling (w.u.) for closed-set sampling and “3x all” sampling for open-set sampling, which
represents the case of oversampling the open-set dataset completely 3 times with training augmentations
to increase the diversity of representations of the limited open-set data. These settings are used for
all results shown in Table 2 and the OpenWGAN-GP results in Table 1. For sampling the closed-set
data, weighted undersampling outperforms random undersampling and balanced undersampling. In
cases where there is a &gt;5% disparity between the number of samples in the open and closed sets,
training is performed with balanced sampling between the open and closed sets. This pertains to all
combinations except the 3x oversampling of the open set and weighted undersampling of the closed set.
Local experimentation suggested that using the entire closed set dataset could yield a slight increase in
Track 3 performance, but this dramatically increases the training time for the OpenWGAN-GP classifier
(experiments not shown). While the Track 3 performance does not appear to be particularly sensitive
to the hidden dimension size, the trend suggests that a smaller hidden dimension may have slightly
improved performance, as shown in Table 4. More exhaustive combinations could not be performed
due to competition submission limitations.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Test-time augmentations</title>
        <p>Several test-time augmentations were evaluated. The performance of each of these augmentations is
shown in Table 5. Multi-instance averaging, averaging of horizontal flips of the same image, averaging
of multiple crops of the same image, and averaging multiple image sizes are explored with Metaformer-0
trained on split D. Since Metaformer-0 does not support inference at a diferent resolution than the
training resolution (in this case 384x384), the image size in Table 5 refers to the image resolution of the
shorter dimension before a square crop of 384. For example, if the image size is 441, then the image is
resized to 441 along the shorter dimension (assuming it is a rectangular image) and then a square center
crop of 384 is taken. As such, image size must be at least 384 for Metaformer-0. Each augmentation
improves performance individually and in combination. Of the test-time augmentations that were
experimented with, multi-instance averaging has the greatest impact of any individual transformation.</p>
        <p>Data
split
open
WGAN
-GP</p>
        <p>i.p.p.
Track 3 performance is best for multi-instance averaging in combination with horizontal flip averaging
of the combinations shown in Table 5.</p>
        <p>Despite improved performance on local validation, increasing the image resolution to 576 relative to
the training resolution of 384 appears to have mixed results on the leaderboard performance as shown
in Table 6. For data split A, the Track 1 score is improved with a higher resolution, but Track 2, Track 3,
and F1 performance are better with an image resolution of 384. Image resolution 384 seems favored
overall.</p>
        <p>Table 7 shows that the removal of poison loss degrades the performance of the model across all
metrics. Track 2 has a larger percent change than Track 1 or F1, which is sensible given that Track 2
corresponds to the edible → poisonous confusion loss.</p>
        <p>Removal of LogitNorm from the training decreased performance for Track 1 and F1 as shown in
Table 8. Presumably this is due to improving the separation of the classes in the latent space which is
used by OpenWGAN-GP for unknown classification. Future work could explore whether LogitNorm
also improves classification of fine-grained datasets in cases for which open-set recognition is not
a consideration. Interestingly, removal of LogitNorm increases performance in Track 2 as shown in
Table 8. The gain in performance in Track 2 from the removal of LogitNorm is great enough that Track
3 (the sum of Track 1 and Track 2 losses) is improved. This may suggest that LogitNorm is incompatible
with the the poison loss used in this work.
4.5. OpenWGAN-GP
The training stability and overall performance of OpenWGAN-GP is demonstrated relative to the
original OpenGAN in Table 9. The identical performance of OpenGAN with and without these training
optimizations suggests that the particular failure state observed is OpenGAN classifying none of the test
set observations as open-set. The data sampling optimizations explored in Table 3 were not suficient to
overcome the failure of OpenGAN to learn a meaningful representation of the data for classification. The
switch from ROC-AUC to F1 as the discriminator selection metric against the validation set apparently
also did not make a diference in light of the training failure. Since the same CAFormer-S18 classification
model was used to generate the embeddings used by the OpenGAN variants shown above, it appears
that the improved Track 1 (and consequently Track 3) performance is the result of the addition of the
WGAN-GP training paradigm to OpenGAN.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.6. Leaderboard performance</title>
        <p>Public leaderboard performance is shown in Table 10 for teams that have selected models for private
leaderboard evaluation. Private leaderboard performance for the selected models for each team are
shown in Table 11. The best performance for each metric is independently reported for each team,
which means that results for each team may represent distinct solutions for each metric. My models
achieved the best performance for Track 1 and accuracy in both the public and private leaderboards and
the best F1 for the private leaderboard. My models placed 3rd on the private leaderboard for Track 2
and 2nd for Track 3, indicating that the poisonous → edible misclassification could be improved for the
methods presented here.</p>
        <p>jack-etheredge (ours)
chirmy</p>
        <p>IES
TingTing1999
upupup
DS@GT</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The proposed methodology for open-set recognition of fungi species addresses the critical challenge of
distinguishing between edible and poisonous mushrooms while efectively identifying unknown species.
This study demonstrates the potential of combining Metaformer and CAFormer models to achieve robust
classification performance. The integration of metadata in Metaformer models significantly enhances
the model’s ability to leverage additional contextual information, thereby improving classification
accuracy. However, one notable challenge is the current evaluation metrics, which assume unknown
species are edible. This assumption may not be ideal if the models are intended to be used for foraging
contexts, where new poisonous species of mushrooms are continually discovered. Re-evaluating these
metrics to consider unknown species as potentially poisonous could mitigate the tension between
open-set classification and the misclassification of poisonous species, thereby enhancing the practical
applicability of the models in real-world scenarios. The current structure of the metrics which treats
unknown species as edible puts open-set recognition and poisonous species identification in direct
opposition with each other since misclassifying a closed-set poisonous species as unknown is heavily
penalized, making their joint optimization challenging. If the intention is to build a system which
displays both high detection rates for unknown species and high recall for poisonous mushrooms,
the high penalty for poisonous → edible misclassification would work in favor of rather than against
identification of unknown species if unknowns were assumed poisonous instead of edible. This may
ultimately improve the model’s performance in both aspects.</p>
      <p>During OpenWGAN-GP training of the open-set discriminator, the same label was used for real features
and open-set features. All results shown utilizing OpenWGAN-GP in Section 4 represent cases when
the open-set label and real label are shared during training of the open-set discriminator. By assigning
the same label to real features and open-set features during the supervised and adversarial phases of
each update respectively, the generator is incentivized to generate features that are indistinguishable
from features created by the closed-set classifier for open-set observations. This is the opposite of the
mapping used in the implementation of OpenGAN, which used the same closed-set label as the real label,
which would have the efect of generating supplemental closed-set features instead. Initial experiments
showed that a higher macro-F1 score was achieved between open- and closed-set validation examples
for the FungiCLEF 2024 dataset when the real label was shared with the open-set label rather than
the closed-set label. If the open-set label is the same as the real label, as in this work, the generator
generates fake open-set features and the discriminator predicts less realistic features as closed-set
features. Intuitively, the open-set should be more diverse and thus sharing the label with the inherently
less realistic fake generated data seems the more logical choice in most scenarios when greater diversity
is expected to be observed in the open-set data.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future work</title>
      <p>Future research should explore the redefinition of evaluation metrics to account for the possibility of
unknown species being poisonous. This adjustment could reduce the conflict between optimizing for
open-set classification and minimizing poisonous species misclassification. Additionally, investigating
more sophisticated ensembling techniques and incorporating advanced data augmentation strategies
could further improve model performance. Exploring the use of few-shot learning techniques might
address the challenge posed by classes with very few observations. Finally, expanding the application
of the proposed OpenWGAN-GP framework to other domains with similar classification challenges
could validate its versatility and robustness.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This paper presents a novel method for open-set recognition of fungi species. The integration of
WGAN-GP training optimizations into OpenGAN, resulting in OpenWGAN-GP, enhances training
stability and enables lightweight discriminators to efectively identify unknown fungi species. An
ensemble of Metaformer and CAFormer models is leveraged to classify fungi accurately while avoiding
the misclassification of poisonous mushrooms as edible. The application of carefully chosen
testtime augmentations, such as image resolution adjustments, horizontal flipping, and multi-instance
averaging, dramatically improves classification performance. These techniques collectively contributed
to achieving 1st place in the FungiCLEF 2024 competition for Track 1, F1, and Accuracy and 2nd place
for the final ranking metric Track 3, which combines edible → poisonous confusion loss Track 2 with
standardard misclassification loss including the unknown class Track 1.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The author would like to thank Jillian Etheredge for constructive criticism of the manuscript.
[22] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy,
B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance
Deep Learning Library, in: Advances in Neural Information Processing Systems,
volume 32, Curran Associates, Inc., 2019. URL: https://papers.nips.cc/paper_files/paper/2019/hash/
bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
[23] X. Wang, D. Kondratyuk, E. Christiansen, K. M. Kitani, Y. Alon, E. Eban, Wisdom of Committees:
An Overlooked Approach To Faster and More Accurate Models, 2022. URL: http://arxiv.org/abs/
2012.01988. doi:10.48550/arXiv.2012.01988, arXiv:2012.01988 [cs].
[24] W. Yu, C. Si, P. Zhou, M. Luo, Y. Zhou, J. Feng, S. Yan, X. Wang, MetaFormer Baselines for
Vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (2024) 896–912. URL:
http://arxiv.org/abs/2210.13452. doi:10.1109/TPAMI.2023.3329173, arXiv:2210.13452 [cs].
[25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image
database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp.
248–255. URL: https://ieeexplore.ieee.org/document/5206848. doi:10.1109/CVPR.2009.5206848,
iSSN: 1063-6919.
[26] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved Training of
Wasserstein GANs, 2017. URL: http://arxiv.org/abs/1704.00028. doi:10.48550/arXiv.1704.00028,
arXiv:1704.00028 [cs, stat].
[27] S. Iofe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift, 2015. URL: http://arxiv.org/abs/1502.03167. doi:10.48550/arXiv.1502.
03167, arXiv:1502.03167 [cs].
[28] B. Xu, N. Wang, T. Chen, M. Li, Empirical Evaluation of Rectified Activations in
Convolutional Network, 2015. URL: http://arxiv.org/abs/1505.00853. doi:10.48550/arXiv.1505.00853,
arXiv:1505.00853 [cs, stat].
[29] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,
Generative Adversarial Networks, 2014. URL: http://arxiv.org/abs/1406.2661. doi:10.48550/arXiv.
1406.2661, arXiv:1406.2661 [cs, stat].
[30] D. P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, CoRR abs/1412.6980 (2014).</p>
      <p>URL: https://api.semanticscholar.org/CorpusID:6628106.
[31] I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, 2019. URL: http://arxiv.org/abs/
1711.05101. doi:10.48550/arXiv.1711.05101, arXiv:1711.05101 [cs, math].
[32] H. Wei, R. Xie, H. Cheng, L. Feng, B. An, Y. Li, Mitigating Neural Network Overconfidence
with Logit Normalization, 2022. URL: http://arxiv.org/abs/2205.09310. doi:10.48550/arXiv.2205.
09310, arXiv:2205.09310 [cs].
[33] C. Guo, G. Pleiss, Y. Sun, K. Q. Weinberger, On Calibration of Modern Neural Networks, 2017. URL:
http://arxiv.org/abs/1706.04599, arXiv:1706.04599 [cs].
[34] S. G. Müller, F. Hutter, TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation, 2021.</p>
      <p>URL: http://arxiv.org/abs/2103.10158. doi:10.48550/arXiv.2103.10158, arXiv:2103.10158 [cs].
[35] P. Chen, S. Liu, H. Zhao, X. Wang, J. Jia, GridMask Data Augmentation, 2024. URL: http://arxiv.</p>
      <p>org/abs/2001.04086. doi:10.48550/arXiv.2001.04086, arXiv:2001.04086 [cs].
[36] H. Touvron, A. Vedaldi, M. Douze, H. Jégou, Fixing the train-test resolution discrepancy, 2022.</p>
      <p>URL: http://arxiv.org/abs/1906.06423. doi:10.48550/arXiv.1906.06423, arXiv:1906.06423 [cs].
[37] M. Tan, Q. V. Le, EficientNet: Rethinking Model Scaling for Convolutional Neural Networks,</p>
      <p>ArXiv (2019). URL: https://api.semanticscholar.org/CorpusID:167217261.
[38] M. Tan, Q. V. Le, EficientNetV2: Smaller Models and Faster Training, in: International Conference
on Machine Learning, 2021. URL: https://api.semanticscholar.org/CorpusID:232478903.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lang</surname>
          </string-name>
          , B. Cheng, J.
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mushroom Poisoning</surname>
          </string-name>
          Outbreaks - China,
          <year>2023</year>
          ,
          <string-name>
            <surname>China</surname>
            <given-names>CDC</given-names>
          </string-name>
          <article-title>Weekly 6 (</article-title>
          <year>2024</year>
          )
          <fpage>64</fpage>
          -
          <lpage>68</lpage>
          . URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832152/. doi:
          <volume>10</volume>
          .46234/ccdcw2024.
          <fpage>014</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of FungiCLEF 2024:
          <article-title>Revisiting fungi species recognition beyond 0-1 cost</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Espitalier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hrúz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          , et al.,
          <source>Overview of LifeCLEF</source>
          <year>2024</year>
          :
          <article-title>Challenges on species distribution prediction and identification</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Diao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Yuan,</surname>
          </string-name>
          <article-title>MetaFormer: A Unified Meta Framework for FineGrained Recognition</article-title>
          ,
          <year>2022</year>
          . URL: http://arxiv.org/abs/2203.02751. doi:
          <volume>10</volume>
          .48550/arXiv.2203. 02751, arXiv:
          <fpage>2203</fpage>
          .02751 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. Mac</given-names>
            <surname>Aodha</surname>
          </string-name>
          , iNat Challenge 2021
          <article-title>- FGVC8</article-title>
          . Kaggle. (
          <year>2021</year>
          ). URL: https://kaggle. com/competitions/inaturalist-2021.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Branson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Haber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <article-title>Building a bird recognition app and large scale dataset with citizen scientists: The fine print in finegrained dataset collection</article-title>
          ,
          <source>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>2015</year>
          )
          <fpage>595</fpage>
          -
          <lpage>604</lpage>
          . URL: http://ieeexplore.ieee.org/document/7298658/. doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2015</year>
          .
          <volume>7298658</volume>
          , conference Name:
          <source>2015 IEEE Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR) ISBN:</article-title>
          9781467369640 Place: Boston, MA, USA Publisher: IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Branson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <source>The Caltech-UCSD Birds-200-2011 Dataset</source>
          ,
          <year>2011</year>
          . URL: https://api.semanticscholar.org/CorpusID:16119123.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chamidullin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of FungiCLEF 2023:
          <article-title>Fungi Recognition Beyond 1/0 Cost</article-title>
          , in
          <source>: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <source>Learning Multiple Layers of Features from Tiny Images</source>
          ,
          <year>2009</year>
          . URL: https://api. semanticscholar.org/CorpusID:18268744.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tiny ImageNet Visual Recognition Challenge</surname>
          </string-name>
          ,
          <year>2015</year>
          . URL: https://api. semanticscholar.org/CorpusID:16664790.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>LeCun</surname>
            , Yann, Cortes, Corinna, Burges,
            <given-names>CJ</given-names>
          </string-name>
          ,
          <article-title>MNIST handwritten digit database</article-title>
          ,
          <source>ATT Labs [Online]. 2</source>
          (
          <year>2010</year>
          ). URL: http://yann.lecun.com/exdb/mnist.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Netzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Coates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bissacco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          , Reading Digits in
          <source>Natural Images with Unsupervised Feature Learning</source>
          ,
          <year>2011</year>
          . URL: https://api.semanticscholar.org/CorpusID:16852518.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kong</surname>
          </string-name>
          , D. Ramanan, OpenGAN: Open-Set Recognition via Open Data Generation,
          <year>2021</year>
          . URL: http://arxiv.org/abs/2104.02939. doi:
          <volume>10</volume>
          .48550/arXiv.2104.02939, arXiv:
          <fpage>2104</fpage>
          .02939 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <article-title>A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2016</year>
          ). URL: https://api.semanticscholar.org/CorpusID:13046179.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Basart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mazeika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mostajabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          , Scaling Out-ofDistribution Detection for Real-World Settings,
          <year>2022</year>
          . URL: https://api.semanticscholar.org/ CorpusID:227407829.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaze</surname>
          </string-name>
          , K. Han,
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <string-name>
            <surname>Open-Set Recognition</surname>
            :
            <given-names>A Good</given-names>
          </string-name>
          <string-name>
            <surname>Closed-Set Classifier</surname>
          </string-name>
          is All You Need?,
          <source>ArXiv abs/2110</source>
          .06207 (
          <year>2021</year>
          ). URL: https://api.semanticscholar.org/CorpusID: 238634102.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Meng</surname>
          </string-name>
          , T. Zhang,
          <article-title>Entropy-guided open-set fine-grained fungi recognition</article-title>
          ,
          <source>in: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          . URL: https://api.semanticscholar.org/CorpusID:264441405.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Rezende Rocha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sapkota</surname>
            ,
            <given-names>T. E.</given-names>
          </string-name>
          <string-name>
            <surname>Boult</surname>
          </string-name>
          , Toward Open Set Recognition,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>35</volume>
          (
          <year>2013</year>
          )
          <fpage>1757</fpage>
          -
          <lpage>1772</lpage>
          . URL: http: //ieeexplore.ieee.org/document/6365193/. doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2012</year>
          .
          <volume>256</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heilmann-Clausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jeppesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Laessøe</surname>
          </string-name>
          , T. Frøslev, Danish Fungi 2020 -
          <article-title>Not Just Another Image Recognition Dataset</article-title>
          ,
          <source>in: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3281</fpage>
          -
          <lpage>3291</lpage>
          . URL: http://arxiv.org/abs/2103. 10107. doi:
          <volume>10</volume>
          .1109/WACV51458.
          <year>2022</year>
          .
          <volume>00334</volume>
          , arXiv:
          <fpage>2103</fpage>
          .10107 [cs, eess].
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Loy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Seesaw Loss for Long-Tailed Instance Segmentation</article-title>
          ,
          <year>2021</year>
          . URL: http://arxiv.org/abs/
          <year>2008</year>
          .10032. doi:
          <volume>10</volume>
          . 48550/arXiv.
          <year>2008</year>
          .
          <volume>10032</volume>
          , arXiv:
          <year>2008</year>
          .10032 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          , S. Yan,
          <source>MetaFormer is Actually What You Need for Vision</source>
          ,
          <source>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>2022</year>
          )
          <fpage>10809</fpage>
          -
          <lpage>10819</lpage>
          . URL: https://ieeexplore.ieee.org/document/9879612/. doi:
          <volume>10</volume>
          .1109/ CVPR52688.
          <year>2022</year>
          .
          <volume>01055</volume>
          , conference Name: 2022 IEEE/CVF Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR) ISBN: 9781665469463 Place: New Orleans, LA</article-title>
          , USA Publisher: IEEE.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>