<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Fine-grained visual classification of fish *</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Piotr</forename><surname>Żerdziński</surname></persName>
							<email>piotzer046@student.polsl.pl</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Applied Mathematics</orgName>
								<orgName type="institution">Silesian University of Technology</orgName>
								<address>
									<addrLine>Kaszubska 23</addrLine>
									<postCode>44100</postCode>
									<settlement>Gliwice</settlement>
									<country key="PL">POLAND</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">IVUS2024: Information Society</orgName>
								<orgName type="institution">University Studies</orgName>
								<address>
									<addrLine>2024, May 17</addrLine>
									<settlement>Kaunas</settlement>
									<country key="LT">Lithuania</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Fine-grained visual classification of fish *</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1EDC3F6F91225B1742A1B5FE7259F953</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>cnn</term>
					<term>attention</term>
					<term>classification</term>
					<term>fine-grained visual classification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Fine-grained visual classification (FGVC) is a concept of classifying images belonging to the same metaclass. This problem is challenging due to the small differences between classes and also the small number of data. In this paper, a fine-grained classification model based on the attention mechanism is proposed. Attention allows the model to focus on small differences that determine class membership. The model used was tested on the Croatian Fish Dataset and achieved an accuracy of 94.375%.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Image processing is a basic category of problems that face AI. Image classification is the main subtype of this problem. There is huge potential in diverse approaches to data classification through different methodologies. In the case of artificial neural networks, the most important tool is convolutional networks. However, it is not possible to identify a single architecture that can solve many classification problems. Hence, various solutions are modeled that can focus on the classification of different features. An example is the possibility of using transfer learning, i.e. neural models learned on huge databases. The possibility of their use consists of using weights and training them on new data <ref type="bibr" target="#b0">[1]</ref>. The capabilities of neural networks are also supported by additional techniques such as attention modules, as shown in <ref type="bibr" target="#b1">[2]</ref>, where the attention module was used in recurrent networks.</p><p>The problem of fine-grained visual classification, on the other hand, implies the classification of images belonging to the same metaclass. Thus, it is a more challenging task, since the differences between classes are small and may involve a small part of the image. Thus: the classification of bird species occurring in a given area may require perceiving the difference only in the shape and color of the feathers <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, while the classification of aircraft assumes recognizing the difference in, for example, the shape of the wings <ref type="bibr" target="#b4">[5]</ref>. In this paper, a finegrained visual classification of the Croatian Fish Dataset is made. In this case, it is important not only to focus on individual small differences between species, but it is necessary to address the problem arising from the specifics of the dataset, i.e., poor visibility and noise.</p><p>Therefore, due to the specificity of the problem, it is necessary to find and focus on the most informative ones that constitute the difference between classes of regions <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>For this reason, it became necessary to implement an attention mechanism to extract the most important fragments.</p><p>In this paper, a Fine-Grained Visual Classification (FGVC) model is presented. Thus, a simple preprocessing coupled with image augmentation is made. In the second place, the CNN architecture implementing the attention mechanism crucial in the analyzed problem is presented, as well as skip connections, which support the extraction of the most relevant image elements. The main contributions of this paper are:</p><p>• new network architecture based on the attention module and skip connections, • new CNN-based FGVC model, • scheme for further expansion with more efficient photo preprocessing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>Working on images taken underwater, we are forced to solve problems caused by the quality of the analyzed images: noise, discoloration and distortion. Water and particles dispersed in water (pollution, plankton, etc.) absorb, scatter and reflect sunlight. The extreme wavelengths in the visible light range are particularly strongly absorbed. The dominant colors of the images are therefore green and blue, and for this reason, any differences due to the color of the fish are lost and are almost invisible.</p><p>In other words, the proposed architectures implemented generative models to improve the quality of the analyzed images <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>. Potentially, denoising and color correction methods also based on generative models could be applied to the fish classification problem <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>. However, the common feature of both solutions is a significant increase in the complexity of the proposed solution. The presented architecture focuses on achieving maximum efficiency with minimized complexity, so the analyzed images were processed only by simple transformations. After preprocessing, the data was divided into two sets: training and test at a ratio of 80% to 20%. Due to the initial imbalance in the number of elements in each class, there was also an imbalance when dividing the training and testing data. The data prepared in this way was then used for training and evaluation of the model.</p><p>The proposed model, shown in Figure <ref type="figure" target="#fig_3">3</ref>, uses images with a size of 64x64 px. This size was determined based on the minimum size of the photos in the database and represents a compromise between the required size to make the photos informative and the excessive deformation of the images caused by expanding the photos. The higher dimensionality of the images did not affect the efficiency of the model, as the initial images were small, but only increased the computational complexity.</p><p>The basic elements of the presented architecture are two convolutional blocks, with a fixed composition visible in Figure <ref type="figure" target="#fig_2">2</ref>. The first contains a convolution layer, followed by a batch normalization and a GELU activation function, and a pooling layer, whose output is subjected to a spatial dropout with a probability equal to 20%. The second, on the other hand, lacks only a pooling layer. Another important component that follows each of the blocks except the last is CBAM -Channel Attention Module Block, which implements the attention mechanism to the model. The skip connection mechanism, which passes the information after the first block deep into the model, requires a convolution layer with a kernel size of 1x1 -in this way, it is possible to change the dimension of the layer, which allows the outputs of the two blocks to be combined. At the very end, two fully connected layers are implemented. The first one is preceded by a dropout with a probability equal to 20%, and after it the GELU function is used. The output of the second uses the LogSoftmax function, which returns the probability for each class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Preprocessing images</head><p>In the proposed architecture, the image preprocessing phase has been reduced to a minimum. As the dataset is loaded, each image is brightened, by increasing the value of each pixel by 50%. For very blurry images, the brightening allows the fish to be significantly separated from the background, while for the more accurate ones, the fish's features become more visible. An example application of this transformation is in Figure <ref type="figure" target="#fig_1">1</ref>.</p><p>The next step is the process of augmenting the training data. It is necessary due to the small size of the initial set (see Table <ref type="table" target="#tab_0">1</ref>). The orientation in the case of the analyzed dataset is irrelevant, so two transformations were applied without fear of losing informativeness: vertical reflection and horizontal reflection. Both transformations are applied with a probability of 90%.</p><p>Below is the initial photo and its modification:  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">CNN model</head><p>To analyze the images and classify them, convolutional neural networks (CNNs), which are dedicated solutions to computer vision problems, were used. CNN's operating principle is based on analyzing an image, represented as a matrix, through a series of layers and functions aimed at classifying or segmenting images. The most important of these is the convolution layer, which detects the basic features of an image through a process of convolution, that is, element-wise multiplication and summation between an image and a set of filters. The filters are also represented as matrices, are initialized randomly, and are corrected in the learning process. The problem of fine-grained visual classification requires a detailed analysis of the image due to the small differences between the classes, so the size of the filters remained small in the proposed solution.</p><p>Pooling layers reduce the size of spatial dimensions of input feature maps. The proposed model uses average pooling with a filter size equal to 2. Importantly, the pooling layer, despite the reduction of dimensionality, does not cause a significant loss of informativeness of the data. This operation is described by the formula:</p><formula xml:id="formula_0">, (<label>1</label></formula><formula xml:id="formula_1">)</formula><p>where 𝐹 𝑐 is the input feature map with dimensions described as 𝐶 is a number of channels, W is width, and H is height.</p><p>The other elements present in each block forming the presented model (see Figure <ref type="figure" target="#fig_2">2</ref>) are batch normalization, GELU activation function and dropout. Due to the small size of the analyzed dataset, it became necessary to prevent overfitting. For this reason, spatial dropout is used, which removes entire feature maps, while classical dropout disables neurons. The use of the GELU function is aimed at introducing nonlinearity, which leads to the model learning more complex relationships. Moreover, it is more efficient than ReLU and ELU <ref type="bibr" target="#b18">[19]</ref>. On the other hand, batch normalization stabilizes and speeds up the learning process, reduces internal covariance shift, improves convergence and presets slight regularization <ref type="bibr" target="#b19">[20]</ref>. This process can be represented by the formula:</p><p>(2) is learnable parameter. The fully connected layer, also known as the Dense layer implements the classic linear approach, in which each neuron in a given layer is connected to each neuron in the previous layer, and each neuron in a given layer passes its activation to each neuron in the next layer. The proposed model uses two dense layers, the first of which uses dropout functions in the form described above before linear transformation, while it passes the result of its action to the GELU activation function. The last layer, with an output size corresponding to the number of classes, passes its result through LogSoftmax functions. The task of this function is to normalize the results of the model to the distribution of the logarithm of probability. It is expressed by the formula:</p><p>(3)</p><p>The presented model also implements an attention mechanism that selectively focuses on parts of the analyzed image, assigning different weights to different areas. The implemented attention mechanism is based on attention block (CBAM -Convolutional Block Attention Module <ref type="bibr" target="#b20">[21]</ref>), consisting of Channel Attention and Spatial Attention.</p><p>Due to the depth of the presented model and the small number of data in the analyzed set, the proposed model uses a skip-connections mechanism to help fight with degradation problem. The most important feature of the skip-connections mechanism is the ability to transfer low-level features captured at the initial layers of the network, to deeper layers, where they are mixed with high-level features. In the proposed model, skip connections are arranged according to the architecture of DenseNets <ref type="bibr" target="#b21">[22]</ref>, i.e. the result of the first block is passed to each subsequent layer. However, the results of subsequent layers are no longer combined. At each stage, convolutional blocks analyze current feature maps combined with low-level features from the first convolutional block.</p><p>During training, the model tries to match the real data as closely as possible, for this reason, it is necessary to use a loss function. This function, which accounts for how much the model's predictions, deviate from the actual data. Minimizing this function is therefore the main goal of training. The proposed solution uses a Cross-entropy Loss function, expressed by the formula: <ref type="bibr" target="#b3">(4)</ref> where 𝑁 is number of classes, 𝑡 is true distribution, 𝑝 is predicted distribution. Complementary to the task of minimizing the loss function is the selection of new values of model parameters, based on the value of this function. This role is assumed by the optimization algorithm, which in the proposed solution is ADAM (Adaptive Moment Estimation). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments</head><p>This section is devoted to analyzing the results obtained for the Croatian Fish Dataset. The results were obtained for two approaches: in the first, the images were not preprocessed at all and were not augmented, while in the second, the full preprocessing described in Section 2.1 was implemented. The results obtained were compared to the results of the authors of the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Database</head><p>The analyzed dataset was prepared by researchers from Fulda University of Applied Sciences, Friedrich Schiller University Jena and the University of Zadar <ref type="bibr" target="#b22">[23]</ref>. The Croatian Fish Dataset contains 764 photos of 12 species of fish found in the Adriatic Sea in Croatia (see Table <ref type="table" target="#tab_0">1</ref>). The images are a subset of the main dataset, which includes 1280x960 px and 1920x1080 px resolution videos. Each detected fish in the output set was marked with a bounding box and extracted as a separate photo. For this reason, the sizes of the images in the analyzed database vary from over 500 × 200 px to 19 × 23 px. Also because these are photos cut from a larger image the position of the fish and their visibility, as well as the type of background and its lighting, varies.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Results and discussion</head><p>The model was trained for 200 epochs at a batch size of 64. The graphs corresponding to the accuracy value and the loss function value for both models are in Figures <ref type="figure" target="#fig_5">4a and 4b</ref>. The graphs for the model trained on images with and without preprocessing in the final stage of training converge to identical values -accuracy reaches nearly 100% with a loss function value of about 0.1. However, the accuracy graph for the model with full preprocessing converges faster to the maximum value, and the difference in the loss function value during training is also evident. Moreover, the value of the loss function for the case without preprocessing is more chaotic even in the final stage, which is translated into fluctuations in accuracy values.</p><p>After each epoch, the models were evaluated on a test set to check their accuracy, precision, recall, and F1-Score. Corresponding graphs can be seen in Fig 5a <ref type="figure" target="#fig_6">, 5b, 5c, and 5d</ref>. The model based on the preprocessed data reaches higher values for all metrics in the vast majority of epochs -the exception being around epochs 150 and 165. The regularity from the model training stage is observed in the evaluation stage: the model trained on the preprocessed data converges to maximum values faster reaching 80% in epoch 14, 90% in epoch 43, and ultimately reaching its maximum value of 96.88% in epoch 174. Meanwhile, the model trained on the set not preprocessed reaches the accuracy value of 80% in the 30th epoch, and surpassing 90% only 4 times, with its maximum value of 91.41% achieved in the 151st epoch. The values in the other metrics present similar patterns: the values for the preprocessed data converge more quickly to maximum values, reaching values equal to or higher than 90%, while the model for data without preprocessing does not reach this value except for the same 4 epochs where accuracy also reached that level.</p><p>Significantly, there was much more fluctuation in the value of each metric for both models. During the training stage, such behavior was evident only for the model based on data not preprocessed, so the training set in this case was only 635 elements, while the more stable training stage for the second model had 1778 images at its disposal. Thus, it can be assumed that the stability of the results obtained is a product of, among other things, the size of the test set. In the case of the test data, it was equal to 159 elements in both models, which, together with the imbalance of class sizes, leads to visible fluctuations close to the maximum value for each metric.</p><p>A summary of the obtained results can be found in Table <ref type="table" target="#tab_1">2</ref>, where the maximum results obtained by the presented architecture, divided into models based on data with and without preprocessing, can be seen. Also included is the accuracy value obtained by the authors of the analyzed database, which equals 66.75% and that was obtained by using pre-trained CNN with SVM for the classification part <ref type="bibr" target="#b22">[23]</ref>. Another well-known solution in the literature is learning transfer <ref type="bibr" target="#b23">[24]</ref>, where the authors achieved accuracy on a level of 83.92%. A similar approach was shown by <ref type="bibr" target="#b24">[25]</ref>, where deep learning CNN was described. The reached accuracy was 95.64%. Compared to those works known from the literature, the proposed solution achieves a higher accuracy value, which was 96.88%. This is due to the deep network, which was extended with an attention module. This solution allowed the classifier to focus on the important features of the classified objects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>This paper proposes an attention-based CNN model supported by a simple preprocessing process. The architecture used was tested on the Croatian Fish Dataset twice, once subjecting the data to preprocessing and the second time not. The results achieved are the highest available. In the future, emphasis should be placed on:</p><p>• achieving a better method of image preprocessing and more efficient data augmentation based on generative models, • to achieve a more efficient and accurate attention mechanism that would more accurately select key elements of the image.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>(a) Original photo. (b) Photo brightened by 50%.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Image before and after preprocessing.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Two convolution blocks forming the presented model.</figDesc><graphic coords="4,130.95,316.10,343.80,86.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: CNN architecture.</figDesc><graphic coords="5,130.95,476.60,335.35,178.15" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>(a) Accuracy before and after preprocessing (b) Loss before and after preprocessing</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Accuracy and loss function during training process before and after preprocessing.</figDesc><graphic coords="7,109.75,99.85,157.85,117.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Performence metrics before and after preprocessing.</figDesc><graphic coords="7,109.80,467.85,157.85,117.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Number of images per species.</figDesc><table><row><cell>Species</cell><cell>Number of images</cell></row><row><cell>Chromis chromis Coris julis female</cell><cell>106 57</cell></row><row><cell>Coris julis male</cell><cell>57</cell></row><row><cell>Diplodus annularis</cell><cell>94</cell></row><row><cell>Diplodus vulgaris</cell><cell>111</cell></row><row><cell>Oblada melanura</cell><cell>57</cell></row><row><cell>Serranus scriba</cell><cell>56</cell></row><row><cell>Spondyliosoma cantharus</cell><cell>51</cell></row><row><cell>Spicara maena</cell><cell>49</cell></row><row><cell>Symphodus melanocercus</cell><cell>105</cell></row><row><cell>Symphodus tinca</cell><cell>34</cell></row><row><cell>Sarpa salpa</cell><cell>17</cell></row><row><cell>Total</cell><cell>794</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>The results of the proposed method compared to other algorithms.</figDesc><table><row><cell>Method</cell><cell>Accuracy (%)</cell></row><row><cell>Jäger et al. (2015) [23] Qiu et al. (2018) [24]</cell><cell>66.75 83.92</cell></row><row><cell>Sudhakara et al. (2022) [25]</cell><cell>95.64</cell></row><row><cell>Proposed architecture without preprocessing</cell><cell>91.41</cell></row><row><cell>Proposed architecture with preprocessing</cell><cell>96.88</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Vgg16-based approach for side-scan sonar image analysis</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jaszcz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">27th International Conference on Information Technology</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Energy consumption prediction model for smart homes via federated learning with lstm</title>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jaszcz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Consumer Electronics</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Looking for the devil in the details: Learning trilinear at-tention sampling network for fine-grained image recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-J</forename><surname>Zha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1903.06150</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Fine-grained visual classification using self assessment classifier</title>
		<author>
			<persName><forename type="first">T</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tjiputra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">D</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2205.10529</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Maji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahtu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kannala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blaschko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1306.5151</idno>
		<title level="m">Fine-grained visual classification of aircraft</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Destruction and construction learning for finegrained image recognition</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mei</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2019.00530</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="5152" to="5161" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Snapmix: Semantically proportional mixing for augmenting fine-grained data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tao</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2012.04846</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Learning multi-attention convolutional neural network for fine-grained image recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICCV.2017.557</idno>
	</analytic>
	<monogr>
		<title level="m">2017 IEEE International Conference on Computer Vision (ICCV)</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5219" to="5227" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Looking for the devil in the details: Learning trilinear at-tention sampling network for fine-grained image recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-J</forename><surname>Zha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1903.06150</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Fine-grained visual classification using self assessment classifier</title>
		<author>
			<persName><forename type="first">T</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tjiputra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">D</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2205.10529</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Bilinear pooling with poisoning detection module for automatic side scan sonar data analysis</title>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jaszcz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Wawrzyniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zaniewicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Improving transfer learning and squeeze-and-excitation networks for small-scale fine-grained fish image classification</title>
		<author>
			<persName><forename type="first">C</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zheng</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2018.2885055</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="78503" to="78512" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Heuristic feedback for generator support in generative adversarial network</title>
		<author>
			<persName><forename type="first">D</forename><surname>Połap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jaszcz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Conference on Agents and Artificial Intelligence</title>
				<meeting>the 16th International Conference on Agents and Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="863" to="870" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Deep learning underwater image color correction and contrast enhancement based on hue preservation</title>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Yeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.1109/UT.2019.8734469</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Underwater Technology</title>
		<imprint>
			<biblScope unit="page" from="1" to="6" />
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A deep cnn method for underwater image enhancement</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICIP.2017.8296508</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Image Processing (ICIP)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="1382" to="1386" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Adaptive weighted multi-discriminator cyclegan for underwater image enhancement</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ko</surname></persName>
		</author>
		<idno type="DOI">10.3390/jmse7070200</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Marine Science and Engineering</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">200</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Quan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.00991</idno>
		<title level="m">Underwater image enhancement based on deep learning and image formation model</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Fast underwater image enhancement for improved visual perception</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sattar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1903.09766</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Hendrycks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1606.08415</idno>
		<title level="m">Gaussian error linear units (gelus)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Batch normalization: Accelerating deep network training by reducing internal covariate shift</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1502.03167</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Woo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">S</forename><surname>Kweon</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1807.06521</idno>
		<title level="m">Cbam: Convolutional block attention module</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1608.06993</idno>
		<title level="m">Densely connected convolutional networks</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Croatian fish dataset: Fine-grained classification of fish species in their natural habitat</title>
		<author>
			<persName><forename type="first">J</forename><surname>Jäger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Simon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Denzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Wolff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fricke-Neuderth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kruschel</surname></persName>
		</author>
		<idno type="DOI">10.5244/C.29.MVAB.6</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page">7</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Improving transfer learning and squeeze-and-excitation networks for small-scale fine-grained fish image classification</title>
		<author>
			<persName><forename type="first">C</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zheng</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2018.2885055</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="78503" to="78512" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Fish classification using deep learning on small scale and low-quality images</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sudhakara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Meena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Madhavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Anjaiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename></persName>
		</author>
		<ptr target="https://ijisae.org/index.php/IJISAE/article/view/2292" />
	</analytic>
	<monogr>
		<title level="j">International Journal of Intelligent Systems and Applications in Engineering</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">279</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
