<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Solution for SnakeCLEF 2022 by Tackling Long-tailed Categorization</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lingfeng</forename><surname>Yang</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Nanjing University of Science and Technology</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Megvii Technology</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Xiang</forename><surname>Li</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Nankai University</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Renjie</forename><surname>Song</surname></persName>
							<email>songrenjie@megvii.com</email>
							<affiliation key="aff1">
								<orgName type="department">Megvii Technology</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kexin</forename><surname>Zhu</surname></persName>
							<email>zhukexin@megvii.com</email>
							<affiliation key="aff1">
								<orgName type="department">Megvii Technology</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gang</forename><surname>Li</surname></persName>
							<email>gang.li@njust.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="institution">Nanjing University of Science and Technology</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">Evaluation Forum</orgName>
								<address>
									<addrLine>September 5-8</addrLine>
									<postCode>2022</postCode>
									<settlement>Bologna</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Solution for SnakeCLEF 2022 by Tackling Long-tailed Categorization</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">E3F4DF89B659CEF6E76032559B730CE2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>SnakeCLEF</term>
					<term>Fine-grained image classification</term>
					<term>Masked autoencoder</term>
					<term>Metadata</term>
					<term>Long-tailed distribution</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>SnakeCLEF 2022 is a fine-grained image classification benchmark for snake identification. Recently, the masked autoencoder (MAE) has shown superior performance on fine-grained image classification tasks. As a result, we use the MAE pretrained ViT models and refine them on the SnakeCLEF 2022. Overall, the learning process contains two difficulties: 1) dealing with fine-grained species that are visually similar and 2) a long-tailed distribution. To address these issues, we propose using statistic-aware post-processing to process the metadata and refine image predictions. Next, we improve an effective logit adjustment loss (ELAL) to alleviate the classification bias toward the head class. Notably, we achieve 2nd place on the SnakeCLEF 2022 benchmark with a 0.84565 top F1 score. Codes and models are available at https://github.com/ylingfeng/snakeclef2022_fgvc9.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Fine-grained visual categorization <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref> is a popular task to identify fine categories out of coarse divisions. Recently, there is an increasing necessity to develop a fine-grained visual categorization algorithm for various species of snakes for biodiversity, conservation, and global health. The SnakeCLEF 2022 benchmark 1  <ref type="bibr" target="#b7">[8]</ref> aims to tackle this requirement, which is held by LifeCLEF <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref> jointly with FGVC9 2 of the CVPR 2022.</p><p>The difficulty in fine-grained snake identification lies in the high intra-class and low interclass differences in appearance, and many species are visually similar to others. Moreover, the species distribution in terms of geographical location is irregular, and some countries (e.g., US) contain hundreds of species while some (e.g., Vatican) have only a few types. In addition, the dataset suffers from a severe long-tailed problem in which two-thirds of categories contain less than 100 instances.</p><p>In terms of the above problems, we propose to solve them individually. First, as for the visually similar samples which are confusing for image-only predictions, we utilize the metadata <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref> provided in the dataset to form a prior distribution of whole species. Different from previous multi-modal methods which embed the metadata to the feature space, we design a parameter-free post-processing structure to refine the predictions. To be specific, we record the number of occurrences of metadata corresponding to each species as the priors. More details can be found in Sec. 4.1. Secondly, in Sec. 4.2 we propose the effective logit adjustment loss (ELAL) to alleviate the prediction bias along with training the long-tailed samples by increasing the optimization weight of the tailed classes while reducing the head.</p><p>Our contributions can be summarized as:</p><p>• We improve a new way to process the metadata by recording statistics referring to each category and a post-processing algorithm is designed to refine the image predictions. • We propose the effective logit adjustment loss (ELAL) to alleviate the prediction bias resulting from the long-tailed dataset. • Based on our algorithm, we achieve 2nd place on the SnakeCLEF 2022 benchmark with a 0.84565 top F1 score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Fine-grained image classification: To deal with the fine-grained property which is hard to recognize merely through the visual clues, there are three workable solutions: 1) to detect the discriminative regions of an image and pass all parts through the networks for joint classification <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. 2) Design a robust feature extraction architecture to capture the subtle representations from an image <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19]</ref>. 3) Utilize the metadata (e.g., shooting date, latitude, longitude, country, and a brief description of the image) <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref>. However, the region detector and feature extractor are heavily designed and thus not suitable for our task. Meanwhile, the existing metadata fusion methods all deal with the multimodal feature by embedding them to higher semantic representations before interaction. Specifically, in SnakeCLEF 2022, the types of metadata are discrete (e.g., country, endemic, and code), which is different from the continuous latitude, longitude, or date hypothesized in the previous works.</p><p>To make use of this metadata, we calculate the existence label within a certain country for all country values in the metadata and form the prior matrix regarding all species.</p><p>Long-tailed distribution: In terms of the long-tailed classification, the data re-sampling <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref> seeks to change class sampling probability based on the number of samples to get a classbalanced dataset, which includes over-sampling and under-sampling. <ref type="bibr" target="#b21">[22]</ref> develop a twostage paradigm to re-balanced the classifier in the second stage with a frozen backbone. Reweighting <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26]</ref> aims to assign the loss weight class-wise to reduce the optimization bias between head-tail classes. The logit adjustment loss <ref type="bibr" target="#b22">[23]</ref> encourages a large relative margin between logits of rare versus dominant labels. Based on this work, we modify the margin coefficient and propose an effective logit adjustment loss (ELAL) to solve the long-tailed problem efficiently. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Task Description</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset</head><p>The SnakeCLEF 2022 dataset <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref> is based on observations of 187,129 snakes, containing 318,532 photographs, belonging to 1,572 snake species, observed in 208 countries. The data comes from the online biodiversity platform, iNaturalist. The provided dataset has a heavy long-tailed class distribution (see Fig. <ref type="figure">1</ref>), where the most frequent species (Natrix natrix) is represented by 6,472 images and the least frequent species by just 5 samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Metric</head><p>The evaluation metric for this competition is Mean (Macro) F1-Score. The F1 score, commonly used in information retrieval, measures accuracy using the statistics precision (P) and recall (R).</p><p>The macro F1 score is not biased by class frequencies and is more suitable for the long-tailed class distributions observed in nature. This metric raises a higher requirement for classification accuracy on tailed categories.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Metadata-aware Post-processing</head><p>Given the metadata-label mapping, we count the instance number for all categories if it is attached to a certain metadata value. Then, we obtain the statistic in form of a metadata-wise category matrix P ∈ R 𝑛×𝑐 , where 𝑛 is the value number within one type of metadata, and 𝑐 is the number of classes. Next, we transform P to a one-hot form P 𝑜 , known as the prior statistic, which represents whether a specific category would appear in a certain place. Finally, P 𝑜 is utilized to refine the prediction from the visual networks via the Hadamard product. The whole structure is illustrated in Fig. <ref type="figure">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Effective Logit Adjustment Loss</head><p>In this section, we introduce our new effective logit adjustment loss (ELAL) function which addresses the performance drop resulting from the prediction bias brought by the long-tailed distribution. First, we give a brief review of the existing loss functions, and then we show how ELAL is developed based on them. The vanilla softmax cross-entropy can be derived by:</p><formula xml:id="formula_0">ℓ(𝑦, 𝑓 (𝑥)) = log ⎛ ⎝ 1 + ∑︁ 𝑦 ′ ̸ =𝑦 𝑒 𝑓 𝑦 ′ (𝑥)−𝑓𝑦(𝑥) ⎞ ⎠ ,<label>(1)</label></formula><p>where 𝑦 denotes the ground-truth label. The logit adjustment loss <ref type="bibr" target="#b22">[23]</ref> adds a label-dependent offset to each of the logits, and modifies Eq. 1 with the shifted coefficient 𝑀 :</p><formula xml:id="formula_1">ℓ(𝑦, 𝑓 (𝑥)) = log ⎛ ⎝ 1 + ∑︁ 𝑦 ′ ̸ =𝑦 𝑀 • 𝑒 𝑓 𝑦 ′ (𝑥)−𝑓𝑦(𝑥) ⎞ ⎠ ,<label>(2)</label></formula><p>where 𝑀 = 𝜋𝑦′ 𝜋𝑦 , 𝜋 𝑦 = 𝑁𝑦 ∑︀ 𝑦 ′ 𝑁 𝑦 ′ ∈ (0, 1), and 𝑁 𝑦 is the total number of instances in each class. Class-balanced Loss <ref type="bibr" target="#b23">[24]</ref> proposes the concept of an effective number to replace the direct label-wise instance number to represent the volume of samples. The definition of the effective number is shown as:</p><formula xml:id="formula_2">𝐸 𝑦 = 1 − 𝛽 𝑁𝑦 1 − 𝛽 .<label>(3)</label></formula><p>Inspired by the conception of effective number, which is an improved representation of the vanilla number, we modify the logit adjustment loss by changing the shifted coefficient 𝑀 to 𝑀 = 𝜖𝑦′ 𝜖𝑦 , 𝜖 𝑦 = 𝐸𝑦 ∑︀ 𝑦 ′ 𝐸 𝑦 ′ ∈ (0, 1) and propose ELAL. Notably, we set 𝛽 = 1𝑒 − 6 by default in our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiments</head><p>In this section, we first elaborate on our experimental settings, then ablation studies are conducted to demonstrate the performance of each component. Finally, we list the top results of our methods and give a considerable analysis. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Setup</head><p>In this paper, we use the Masked autoencoder (MAE) <ref type="bibr" target="#b26">[27]</ref> pretrained ViT <ref type="bibr" target="#b27">[28]</ref> models conducted on ImageNet-1K <ref type="bibr" target="#b28">[29]</ref> training set for 800 epochs. The fine-tuning codes and checkpoints refer to the MAE repository <ref type="foot" target="#foot_0">3</ref> . The ImageNet-1K dataset has 1.3M images with 1K categories for training and 50K images for validation. Notably, we do not use the larger ImageNet-22K (IN22K) dataset, which contains 14.2M images and 22K classes. Based on the MAE pretrained models, we finetune 50 epochs on the SnakeCLEF 2022 dataset, and the default setting is depicted in Table <ref type="table" target="#tab_0">1</ref>. We randomly select 1/10 of the training dataset to form the validation set to update our algorithm, and a full set is used to train models which present the final submissions. To be specific, we set batch size per GPU to 2 to avoid exceeding the GPU memory. The effective learning rate is obtained following MAE: lr= base_lr×globalbatchsize / 256. We apply random resizing/cropping, random horizontal flipping <ref type="bibr" target="#b29">[30]</ref>, label-smoothing regularization <ref type="bibr" target="#b30">[31]</ref>, Mixup <ref type="bibr" target="#b31">[32]</ref>, CutMix <ref type="bibr" target="#b32">[33]</ref>, RandomErasing <ref type="bibr" target="#b33">[34]</ref>, and RandAug <ref type="bibr" target="#b34">[35]</ref> as the standard data augmentations. Notably, all ablation studies are conducted under ViT-L for fair comparisons. The ViT-large and ViT-huge models are trained on eight NVIDIA TITAN Xp GPUs (12G) and eight GeForce RTX 3090 GPUs (24G), respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Ablation Study</head><p>First, we compare the performance with different sets of metadata for post-processing. Table <ref type="table" target="#tab_1">2</ref> shows that refining predictions with "endemic" and "code" metadata perform the best. Next, we conduct the ablation on two losses. Table <ref type="table" target="#tab_2">3</ref> shows our ELAL achieves a higher F1 score under two sets of input resolution. To demonstrate the effectiveness of ELAL on tail   class and the potential side effect on head class, we calculate the validation accuracy on the top 10/50/100/500 class from the head and tail classes, respectively (Table <ref type="table" target="#tab_3">4</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Results</head><p>Based on the strong ViT-L and ViT-H <ref type="bibr" target="#b27">[28]</ref>, we conduct experiments with an input resolution of 384/392/448 learned on a full training set. We adopt multi-crop <ref type="bibr" target="#b35">[36]</ref> as post-processing strategies, which would crop the given image into four corners and the central crop plus the flipped version and average the predictions of whole crops. The model ensemble is an averaging operation over each prediction score after the softmax of the selected models. Our final submissions come from the ensemble of models w/o and w/ multi-crop, which receives 0.84409 and 0.84565 F1 scores on the private benchmark.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Analysis</head><p>We attempt to run a ViT-H with a 448 resolution, which is capable of reaching a higher accuracy theoretically, however, due to the resource limitation we only present the result of a 392 resolution. Also, we notice that the effect of post-processing on the private benchmark is not as significant as on the public benchmark. We suspect that there is distribution gap between the train and test set of metadata while the public benchmark is less affected.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this paper, we give our solution to the Snake Recognition Competition (SnakeCLEF 2022) in FGVC9, which is challenging due to the fine-grained categorization and long-tailed classes. To deal with the difficulties, we utilize statistic-aware metadata to refile image predictions through post-processing and propose the effective logit adjustment loss (ELAL) to handle the long-tailed problem, respectively. Our team achieves the 2nd place on the private benchmark with a 0.84565 top F1 score.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :Figure 2 :</head><label>12</label><figDesc>Figure 1: Visualization of the instance number for each class sorted by number in descending order.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Fine-tuning settings on the SnakeCLEF 2022 dataset.</figDesc><table><row><cell>Config</cell><cell>Value</cell></row><row><cell>optimizer</cell><cell>AdamW</cell></row><row><cell>base learning rate</cell><cell>1e-4 (ViT-L), 1e-3 (ViT-H)</cell></row><row><cell>weight decay</cell><cell>0.05</cell></row><row><cell>optimizer momentum</cell><cell>𝛽1, 𝛽2=0.9, 0.999</cell></row><row><cell>layer-wise lr decay</cell><cell>0.75 (ViT-L), 0.8 (ViT-H)</cell></row><row><cell cols="2">global batch size (over 8 GPUs) 16</cell></row><row><cell>batch size per GPU</cell><cell>2</cell></row><row><cell>accumulated iteration</cell><cell>4</cell></row><row><cell>learning rate schedule</cell><cell>cosine decay</cell></row><row><cell>warmup epochs</cell><cell>5</cell></row><row><cell>augmentation</cell><cell>RandAug (9, 0.5)</cell></row><row><cell>label smoothing</cell><cell>0.1</cell></row><row><cell>mixup</cell><cell>0.8</cell></row><row><cell>cutmix</cell><cell>1.0</cell></row><row><cell>random erase</cell><cell>0.25</cell></row><row><cell>drop path</cell><cell>0.2</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Ablation study on the performance of post-processing under different metadata combinations.</figDesc><table><row><cell>code</cell><cell cols="3">endemic country val acc val F1 test F1</cell></row><row><cell></cell><cell></cell><cell></cell><cell>80.470 0.758 0.755</cell></row><row><cell></cell><cell></cell><cell>✓</cell><cell>88.554 0.810 0.796</cell></row><row><cell>✓</cell><cell></cell><cell></cell><cell>88.613 0.856 0.834</cell></row><row><cell>✓</cell><cell>✓</cell><cell></cell><cell>89.949 0.873 0.864</cell></row><row><cell>✓</cell><cell>✓</cell><cell>✓</cell><cell>93.893 0.920 0.815</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Ablation study on the performance of the long-tailed loss. CE: Cross-entropy loss. ELAL: Effective logit adjustment loss.</figDesc><table><row><cell cols="2">resolution loss</cell><cell>val acc val F1 test F1</cell></row><row><cell>224</cell><cell cols="2">CE ELAL 0.915 0.892 0.756 0.858 0.821 0.735</cell></row><row><cell>384</cell><cell cols="2">CE ELAL 0.939 0.920 0.815 0.889 0.859 0.792</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Ablation study on the performance of the head and tail class. We depict the accuracy of the top 10/50/100/500 from the head/tail classes. CE: Cross-entropy loss. ELAL: Effective logit adjustment loss.</figDesc><table><row><cell>loss</cell><cell>class 10</cell><cell>50 100 500</cell></row><row><cell>CE</cell><cell cols="2">head 1.00 1.00 0.95 0.94 tail 0.30 0.46 0.56 0.79</cell></row><row><cell>ELAL</cell><cell cols="2">head 1.00 1.00 0.94 0.94 tail 0.90 0.82 0.88 0.93</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Performance of the final submissions on public/private benchmarks.</figDesc><table><row><cell></cell><cell></cell><cell>center crop</cell><cell>multi crop</cell></row><row><cell>model</cell><cell cols="3">resolution public private public private</cell></row><row><cell>large</cell><cell>384</cell><cell cols="2">0.87134 0.81199 0.87996 0.81997</cell></row><row><cell>large</cell><cell>432</cell><cell cols="2">0.88375 0.82382 0.89173 0.83063</cell></row><row><cell>huge</cell><cell>392</cell><cell cols="2">0.89692 0.83662 0.89449 0.84057</cell></row><row><cell>ensemble</cell><cell>-</cell><cell cols="2">0.90245 0.84409 0.89822 0.84565</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://github.com/facebookresearch/mae</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Part-based r-cnns for fine-grained category detection</title>
		<author>
			<persName><forename type="first">N</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Donahue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Girshick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Darrell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECCV</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Evaluation of output embeddings for fine-grained image classification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Akata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Reed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Walter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schiele</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The application of two-level attention models in deep convolutional neural network for fine-grained image classification</title>
		<author>
			<persName><forename type="first">T</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Learning to navigate for fine-grained classification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECCV</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<author>
			<persName><forename type="first">D</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Bhunia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-Z</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The devil is in the channels: Mutual-channel loss for fine-grained image classification</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<title level="m">Multi-branch and multi-scale attention learning for fine-grained visual categorization</title>
				<imprint>
			<publisher>MMM</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Behera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wharton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hewage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bera</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.06635</idno>
		<title level="m">Context-aware attentional pooling (cap) for fine-grained visual classification</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Overview of SnakeCLEF 2022: Automated snake species identification on a global scale</title>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Durso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hrúz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bolon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2022 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Lifeclef 2022 teaser: An evaluation of machine-learning based species identification and species distribution prediction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Goëau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lorieul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Durso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bolon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conference on Information Retrieval</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="390" to="399" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overview of lifeclef 2022: an evaluation of machine-learning based species identification and species distribution prediction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Goëau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lorieul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Durso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Glotin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Planqué</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-P</forename><surname>Vellinga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Navine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klinck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Denton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Eggel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bonnet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hruz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Improving image classification with location context</title>
		<author>
			<persName><forename type="first">K</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Paluri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bourdev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Recommending plant taxa for supporting on-site species identification</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">C</forename><surname>Wittich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Seeland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wäldchen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rzanny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mäder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC bioinformatics</title>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Geo-aware networks for fine-grained recognition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Potetz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Howard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Leung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Adam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Presence-only geographical priors for fine-grained image classification</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">Mac</forename><surname>Aodha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perona</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2203.03253</idno>
		<title level="m">Dynamic mlp for finegrained image classification by leveraging geographical and temporal information</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">Q</forename><surname>Diao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yuan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2203.02751</idno>
		<title level="m">Metaformer: A unified meta framework for fine-grained recognition</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-J</forename><surname>Zha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1911.03621</idno>
		<title level="m">Learning deep bilinear transformation for fine-grained image representation</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Channel interaction networks for finegrained image categorization</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Scott</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>AAAI</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Grafit: Learning fine-grained image representations with coarse labels</title>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jégou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-S</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-M</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Smote: synthetic minority over-sampling technique</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Chawla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Bowyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">O</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">P</forename><surname>Kegelmeyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of artificial intelligence research</title>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rohrbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gordo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kalantidis</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.09217</idno>
		<title level="m">Decoupling representation and classifier for long-tailed recognition</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Menon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jayasumana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Rawat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Veit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2007.07314</idno>
		<title level="m">Long-tail learning via logit adjustment</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Class-balanced loss based on effective number of samples</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Belongie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Equalization loss v2: A new gradient balance approach for long-tailed object detection</title>
		<author>
			<persName><forename type="first">J</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2201.02593</idno>
		<title level="m">Equalized focal loss for dense long-tailed object detection</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Masked autoencoders are scalable vision learners</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dollár</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Girshick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Beyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Weissenborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Unterthiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Minderer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Heigold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gelly</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.11929</idno>
		<title level="m">An image is worth 16x16 words: Transformers for image recognition at scale</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Imagenet: A large-scale hierarchical image database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Going deeper with convolutions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sermanet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Reed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Anguelov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Erhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rabinovich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Rethinking the inception architecture for computer vision</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wojna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cisse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lopez-Paz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.09412</idno>
		<title level="m">mixup: Beyond empirical risk minimization</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Cutmix: Regularization strategy to train strong classifiers with localizable features</title>
		<author>
			<persName><forename type="first">S</forename><surname>Yun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Choe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yoo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Random erasing data augmentation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>AAAI</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Cubuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<title level="m">Randaugment: Practical automated data augmentation with a reduced search space</title>
				<imprint>
			<publisher>CVPRW</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title level="m" type="main">Imagenet classification with deep convolutional neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>NeurIPS</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
