<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multimodal networks for Species Distribution Modeling Notebook for the LifeCLEF Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Aman</forename><forename type="middle">R</forename><surname>Syayfetdinov</surname></persName>
							<email>asysyfetdinov@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Moscow Institute of Physics and Technology (MIPT)</orgName>
								<address>
									<settlement>Dolgoprudny</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multimodal networks for Species Distribution Modeling Notebook for the LifeCLEF Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">4EAC6110A0ED959F6EC0D8F979B2F0A9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Species distribution modeling</term>
					<term>Biodiversity</term>
					<term>LifeCLEF</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Understanding the spatial and temporal distribution of plant species is important for many biodiversity management and conservation scenarios. This paper presents solution to the GeoLifeCLEF challenge, which involves prediction of the presence of plant species using satellite images and time series, climate time series and other rasterized environmental data. Multimodal model leveraged satellite images, bioclimatic cubes and feature vectors of satellite time series and environmental scalar values. With the selected presence probability threshold for inference this method allowed to reach 𝐹1-score of 0.347 on public and 0.345 on private leaderboard, placing us 9th on the leaderboard.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The GeoLifeCLEF 2024 competition <ref type="bibr" target="#b0">[1]</ref> is held jointly as part of the LifeCLEF 2024 lab <ref type="bibr" target="#b1">[2]</ref> and the FGVC11 workshop. Just like in the GeoLifeCLEF 2023 competition <ref type="bibr" target="#b2">[3]</ref> the goal is to predict a list of species most likely to be observed at a given location using various geographical and environmental data such as satellite images and time series, climatic time series, and other rasterized data: land cover, human footprint, bioclimatic, and soil variables. Typically, the task of species distribution modelling <ref type="bibr" target="#b3">[4]</ref> has challenges associated with imbalances in species presence and absence in the data, large-scale multimodal learning, and plant species diversity. Its results could be useful for predicting biodiversity change and mitigating environmental pressures from human activities.</p><p>The GeoLifeCLEF 2024 training data includes a collection of observations of plants in Europe. Each survey consists of a list of plant species with the GPS coordinates and a set of variables characterizing the landscape and environment around them. There are around 90K surveys with around 5K unique plant species in the dataset. This technical report presents selected approach to the competition, which is a multimodal network based on bioclimatic cubes, sentinel image patches (RGB-patch and NIR-patch) and vector of climate, elevation, human footprint, land cover, soilgrid and landsat time series data. Traing code can be found here<ref type="foot" target="#foot_0">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Data and Evaluation Metric</head><p>Data plays an important role in prediction plant species distribution in a given location and time. In this section, we briefly present the data and the evaluation metric used for the competition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Data</head><p>This paragraph is simply a description of the standard GeoLifeCLEF 2024 dataset. The training dataset contains presence-absence (PA) surveys and presence-only (PO) surveys. PO data includes about 5 million observations and reports only presence and not absence of certain plant species in specific areas. On the other hand, PA data combines around 90K surveys with about 5K unique species of the European flora and reports presence and absence of plant species. In solution only presence-absence surveys were used and everywhere below the report will only be about this type of data. The total number of surveys in the test set was 5K.</p><p>Training dataset distribution of the number of observations of each plant species is shown in Figure <ref type="figure" target="#fig_0">1</ref>. Almost 50% of plant species in training data have a number of occurrences less than 16 and only 20% have more than 110 occurrences. Almost all observations were made in Western Europe, a map of locations can be seen in Figure <ref type="figure" target="#fig_1">2</ref>. More detailed descriptions can be found at competitions's homepage <ref type="foot" target="#foot_1">2</ref> .</p><p>Each survey is paired with the following covariates:</p><p>• Satellite image patches: 128m×128m RGB-NIR patches centered at each observation, at a resolution of 1 meter per pixel; • Satellite time series: Up to 20 years of values for six satellite bands (R, G, B, NIR, SWIR1, and SWIR2); • Environmental rasters Various climatic, pedologic, land use, and human footprint variables at the European scale. It was provided as scalar values, time-series, and original rasters;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Evaluation Metric</head><p>The evaluation metric for the GeoLifeCLEF 2024 competition is the samples-averaged 𝐹 1 -score computed on a set made of species presence-absence samples. The 𝐹 1 -score is an average measure of overlap between the predicted and actual set of species present at a given location and time. Each observation 𝑖 is associated with a list of ground-truth labels 𝑌 𝑖 corresponding  to the observed plant species. For each observation, the submissions provide a set of species predicted presence 𝑃 𝑖,1 , 𝑃 𝑖,2 , ..., 𝑃 𝑖,𝑅 𝑖 . The micro 𝐹 1 -score is then computed using:</p><formula xml:id="formula_0">𝐹 1 = 1 𝑁 𝑁 ∑︁ 𝑖=1 𝑇 𝑃 𝑖 𝑇 𝑃 𝑖 + (𝐹 𝑃 𝑖 + 𝐹 𝑁 𝑖 )/2</formula><p>where 𝑇 𝑃 𝑗 , 𝐹 𝑃 𝑗 and 𝐹 𝑁 𝑗 are the true positive, the false positive and the false negative of the j-th input sample, respectively. 𝑁 is the number of samples for evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>This section describes the methods that were tried during the competition. Strategy was centered around the baseline model <ref type="foot" target="#foot_2">3</ref> provided by the competition organizers. The baseline 𝐹 1 -score is 0.31 on the public set. This model leveraged all environmental data and utilized a multimodal neural network with separated features extractors to return a single prediction set in order to take advantage of every modality (satellite images, bioclimatic cubes, landsat cubes). The main change was to replace landsat cubes with a vector of satellite time series and environmental scalar values, everywhere below it is called feature vector. In addition, plant species with an occurrence number greater than 10 was used to train the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Feature vector</head><p>Feature vector consists of climate, elevation, human footprint, land cover, soilgrid and landsat time series data. Methods for compiling this data are taken from the public notebook <ref type="foot" target="#foot_3">4</ref> . Climatic time series data was merged within a 10-year time window. Some positions had missing values, which were filled with spatial interpolation. It appeared that there were densely populated measurements near the missing regions, so missing values were filled with values from the nearest neighbors. Finally, each survey had 1198 values of feature vector. The train and test versions can be found here. Before going to model feature vectors are normalized with standard scaler.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Model architecture</head><p>The architecture closely follows the baseline model, incorporating a multimodal neural network that utilizes three distinct feature extractors for bioclimatic rasters (19 channels), satellite images (4-channel RGB with NIR), and feature vectors (1198 channels). These outputs are combined and processed through fully connected layers to generate predictions. The first bioclimatic head involves layer normalization, ResNet18 <ref type="bibr" target="#b4">[5]</ref> without pretrained weights, and a dropout <ref type="bibr" target="#b5">[6]</ref> with a 0.1 probability. The second image head employs a , swin transformer <ref type="bibr" target="#b6">[7]</ref> model with ImageNet <ref type="bibr" target="#b7">[8]</ref> weights and a dropout layer with a 0.1 probability. Prior to this stage, image data undergo augmentation techniques like random rotation, random brightness contrast, and normalization. The third head comprises a sequence of layer normalization and three linear layers with GELU <ref type="bibr" target="#b8">[9]</ref> activation function, along with dropout set at a 0.1 probability (the first layer mapping from 1198 to 1198, the second and the third layers map to 1000 outputs). Subsequently, the bioclimatic and feature outputs are normalized and combined with the image output. The final classifier is constructed with three linear layers utilizing GELU activation function and dropout at a 0.1 probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Training and inference</head><p>The model was trained on PA data for 12 epochs using the Adam optimizer with a learning rate of 8e-5 and binary cross entropy (BCE) loss and batch size equal to 128. During training, we focused on plant species with an occurrence number greater than 10, resulting in 2857 unique species out of a total of 5015. It's important to highlight that the occurrence threshold value was determined through experimentation.</p><p>In final approach to inference, the strategy used in the baseline notebook was changed. Rather than forecasting the 25 most probable species for every observation in the test dataset, selected threshold of 0.18 was used. This threshold determined that species with probabilities surpassing this value were classified as present. Additionally, test observations featuring fewer than 4 represented species was assigned with the 4 most likely species.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Experimental settings</head><p>Experiments were conducted with the multimodal network described in Section 3.2. The detailed settings of training are shown in Table <ref type="table" target="#tab_0">1</ref>. For comparing different versions of models we used 25 most probable species to remove bias with probability threshold described in Section 3.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Usage of feature vector</head><p>In order to investigate the impact of using the feature vector head we conducted ablation study. Table <ref type="table">2</ref> represents the detailed results. It seems that with selected hyperparameters combination </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Imbalanced data</head><p>As was mentioned before, the dataset is strongly unbalanced, which means that for almost all species the number of observations detecting their presence is much less than the number of observations detecting their absence. we tried to solve this problem in different ways, for example, adding pos_weight to bce loss, adding different data augmentation. The final option was to limit the number of species on which the model is trained, taking only those with occurrence number greater than 10. Table <ref type="table">2</ref> shows how the score depends on the threshold for the occurrence number. Another thing was lowering the threshold for a species having a probability higher than which it was considered present. For those observations that had fewer than 4 species present we assigned the 4 most likely plant species. Results of different probability thresholds are presented in Table <ref type="table" target="#tab_1">3</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>We presented the working principles of submission to the GeoLifeCLEF 2024 challenge and discussed some of the key findings of the results. We have not conducted an expansive, let alone exhaustive hyperparameter search and believe that doing so could raise performance a bit. The main achievement was to use proper model architecture, choosing training data and changing the inference strategy. In final solution, we did not use PO data and training strategies used in previous years <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. Obviously, using more data would help for better generalization and it is certainly high on the list of improvements that need to be made. Also, possible improvements can be achieved by additionally searching for better backbone models, like Inception-v4 <ref type="bibr" target="#b11">[12]</ref> or Vision Transformer, ViT B / 16 <ref type="bibr" target="#b12">[13]</ref> for different modalities and using an ensemble of various models.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Histogram for distribution of the occurrences of plant species in the training dataset. Horizontal axis on a logarithmic scale for better understanding.</figDesc><graphic coords="3,113.87,84.19,367.54,190.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Map of Europe with observation distribution. The train data location is green point, the test data is red points.</figDesc><graphic coords="3,169.60,319.94,256.08,170.94" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Selected multi-model architecture. Bioclimatic, image and feature heads mapping to 1000, 768, 1000 outputs, respectively. Then stacked outputs pass through linear layers mapping to the 2857 species (species with occurrence number &gt; 10)</figDesc><graphic coords="5,89.29,84.19,425.70,242.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Frequency of Special Characters</figDesc><table><row><cell></cell><cell cols="2">Hyper-parameters</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>Batch size</cell><cell></cell><cell>128</cell><cell></cell><cell></cell></row><row><cell></cell><cell>Optimizer</cell><cell></cell><cell>Adam</cell><cell></cell><cell></cell></row><row><cell></cell><cell>Learning rate</cell><cell></cell><cell>8e-5</cell><cell></cell><cell></cell></row><row><cell></cell><cell>Lr scheduler</cell><cell></cell><cell>CosineAnnelingLR</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Number of epochs</cell><cell>12</cell><cell></cell><cell></cell></row><row><cell>Table 2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Ablation study of usage the feature vector head</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="4">Bioclimatic head Image head Feature head Landsat head</cell><cell cols="2">𝐹 1 -score Public Private</cell></row><row><cell>✓</cell><cell>✓</cell><cell>-</cell><cell>✓</cell><cell>0.315</cell><cell>0.316</cell></row><row><cell>✓</cell><cell>✓</cell><cell>✓</cell><cell>✓</cell><cell>0.317</cell><cell>0.317</cell></row><row><cell>-</cell><cell>✓</cell><cell>✓</cell><cell>✓</cell><cell>0.306</cell><cell>0.311</cell></row><row><cell>✓</cell><cell>✓</cell><cell>✓</cell><cell>-</cell><cell>0.322</cell><cell>0.323</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Score depending on the number of occurrences of plant species for model training</figDesc><table><row><cell>Species with number of occurrences</cell><cell cols="2">𝐹 1 -score Public Private</cell></row><row><cell>&gt;0 (5096 in total)</cell><cell>0.322</cell><cell>0.323</cell></row><row><cell>&gt;5 (3425 in total)</cell><cell>0.322</cell><cell>0.326</cell></row><row><cell>&gt;10 (2857 in total)</cell><cell>0.326</cell><cell>0.329</cell></row><row><cell>&gt;15 (2511 in total)</cell><cell>0.324</cell><cell>0.328</cell></row><row><cell cols="3">of bioclimatic, image and feature heads gives the best performance of around 0.32 on both public</cell></row><row><cell cols="3">and private scores. The performances of other configurations are about 0.31 or less.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Score depending on the presence probability threshold</figDesc><table><row><cell>Probability threshold</cell><cell cols="2">𝐹 1 -score Public Private</cell></row><row><cell>0.4</cell><cell>0.309</cell><cell>0.303</cell></row><row><cell>0.3</cell><cell>0.334</cell><cell>0.332</cell></row><row><cell>0.2</cell><cell>0.346</cell><cell>0.345</cell></row><row><cell>0.15</cell><cell>0.345</cell><cell>0.342</cell></row><row><cell>0.1</cell><cell>0.329</cell><cell>0.327</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.kaggle.com/code/lonansyayf/baseline-with-modifications/notebook</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.kaggle.com/competitions/geolifeclef-2024/data</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://www.kaggle.com/code/picekl/sentinel-landsat-bioclim-baseline-0-31626</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://www.kaggle.com/code/gobyeonggeon/preprocess-visualize-spatial-data-eda-xgb</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of GeoLifeCLEF 2024: Species presence prediction based on occurrence data and high-resolution remote sensing images</title>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Botella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Marcos</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Palard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Larcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Leblanc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Estopinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bonnet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of lifeclef 2024: Challenges on species distribution prediction and identification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Picek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Goëau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Espitalier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Botella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Marcos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Estopinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Leblanc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Larcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Šulc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hrúz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Matas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of GeoLifeCLEF 2023: Species presence prediction based on occurrence data and highresolution remote sensing images</title>
		<author>
			<persName><forename type="first">C</forename><surname>Botella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deneu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Estopinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Servajean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Marcos</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2023 -Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">New trends in species distribution modelling</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">E</forename><surname>Zimmermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Edwards</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Graham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename><surname>Pearman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-C</forename><surname>Svenning</surname></persName>
		</author>
		<idno type="DOI">10.1111/j.1600-0587.2010.06953.x</idno>
	</analytic>
	<monogr>
		<title level="j">Ecography</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="985" to="989" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2016.90</idno>
	</analytic>
	<monogr>
		<title level="m">Conference: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Dropout: A simple way to prevent neural networks from overfitting</title>
		<author>
			<persName><forename type="first">N</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="1929" to="1958" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Swin transformer: Hierarchical vision transformer using shifted windows</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Guo</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICCV48922.2021.00986</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">ImageNet: a Large-scale hierarchical image database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F.-F</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2009.5206848</idno>
	</analytic>
	<monogr>
		<title level="m">Conference: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="248" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Hendrycks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1606.08415</idno>
		<title level="m">Gaussian error linear units</title>
				<imprint>
			<publisher>GELUs</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Leverage samples with single positive labels to train CNNbased models for multi-label plant species prediction</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kojima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wada</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Block label swap for species distribution modelling</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kellengerger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tuia</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Inception-v4, Inception-ResNet and the impact of residual connections on learning</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Alemi</surname></persName>
		</author>
		<idno type="DOI">10.1609/aaai.v31i1.11231</idno>
	</analytic>
	<monogr>
		<title level="j">AAAI Conference on Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">An image is worth 16x16 words: Transformers for image recognition at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Beyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Weissenborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Unterthiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Minderer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Heigold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Houlsby</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.11929</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
