<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Classification of a Small Imbalanced Dataset of Vine Leaves Images using Deep Learning Techniques</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Amjad</forename><surname>Balawi</surname></persName>
							<email>amjad.balawi20@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Abdullah</forename><surname>Al Zoabi</surname></persName>
							<email>abdullah.al.zoabi@outlook.com</email>
						</author>
						<author>
							<persName><forename type="first">José</forename><surname>Luis</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Seixas</forename><surname>Junior</surname></persName>
							<email>jlseixasjr@inf.elte.hu</email>
						</author>
						<author>
							<persName><forename type="first">Tomáš</forename><surname>Horváth</surname></persName>
							<email>tomas.horvath@inf.elte.hu</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Data Science and Engineering ELTE</orgName>
								<orgName type="department" key="dep2">t-labs.elte</orgName>
								<orgName type="institution">Eötvös Loránd University http</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">Faculty of Informatics</orgName>
								<orgName type="department" key="dep2">3in Research Group</orgName>
								<address>
									<settlement>Martonvásár</settlement>
									<country key="HU">Hungary</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Classification of a Small Imbalanced Dataset of Vine Leaves Images using Deep Learning Techniques</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D1B848150F86173986B132F94F5667DB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Convolutional Neural Network (CNN) has become one of the most popular techniques in image classification. Usually CNN models are trained on a large amount of data, but in this paper, it is discussed CNN usage on data shortage and class imbalance issues. The study is conducted on a small dataset of vine leaves images on a classification task with five classes using two different approaches. In the first approach, a simple CNN model is used, while in the second approach, the Visual Geometry Group (VGG) model with transfer learning is used. It is shown that using different deep learning techniques such as transfer learning, stratified sampling, data augmentation, and the state of arts CNN models such as VGG gives a relatively very good model performance with up to 87% accuracy.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Deep Learning (DL) was inspired by the human brain and try to simulate how humans learn. In DL, networks of neurons organized in multiple layers analyze large amounts of data to find the underlying structure or pattern, the main idea is to do that automatically without explicitly programming it, the computer learns how to classify text, sounds and images. In Computer Vision (CV) tasks, the computer is trained on huge amount of images by encoding these images pixels into internal representation, so the classifier can find the patterns on the input images <ref type="bibr" target="#b0">[1]</ref>.</p><p>DL outperforms other solutions in multiple domains, including speech, vision, video and natural language processing, it also reduces the use of feature engineering stage which is one of the most time-consuming tasks in machine learning <ref type="bibr" target="#b1">[2]</ref>. The other reason, that made DL so famous in the last few years, is a huge improvement in terms of computational power that can be utilized to accomplish such tasks. However, one common problem is to preform badly on unseen data (test dataset), due to over-fitting, usually, a large dataset is required to increase the model performance. Another problem is that it is hard to choose the right model for any given problem.</p><p>Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p><p>Convolutional Neural Network (CNN or ConvNets) is a sort of Neural Network mostly popular in image classification <ref type="bibr" target="#b2">[3]</ref> but it has a fewer number of connections, which means, a fewer number of model parameters making it less sensitive to over-fitting. The second reason why CNN is powerful in computer vision tasks is the parameter sharing, which means, if the filter is useful on a part of the image it could be useful on another one. Furthermore, CNNs preserves the spatial information of the image which makes the classifier more robust against the affine transformations like translation and rotation.</p><p>In many cases, especially in the current times, image data scarcity can be dealt by frequent acquisition, but there are still some situations in which acquisition is not easy or may not be frequent, as in agriculture, where a plant can not be created in an hour or a day. There are also cases where synthetic images creation is far from real world images, so training any model in this situation would create good controlled results but would not solve real problems.</p><p>The goal of this article is to find techniques, procedures or functions that can deal with the problems of using CNNs in small and imbalanced databases. For such, two different structures of CNN are implemented, with combination of different DL techniques and procedures such as data augmentation, transfer learning, stratified sampling and model picking based on validation accuracy, also showing the transition from a simple CNN model to a state of art model like VGG.</p><p>This paper is organized as follow: Section 2 presents the techniques and definitions used in the proposals of this work, followed by Section 3 which describes the steps for constructing the models. Section 4 shows the results obtained and in Section 5 the conclusions that can be inferred.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Proposed Approaches</head><p>There are many Machine Learning (ML) techniques that could be used for general classification problems like K-Nearest Neighbor (KNN), Logistic Regression, Support Vector Machines (SVM) and Artificial Neural Networks (ANN), but in term of the image classification problems the most popular technique is the Convolution Neural Networks. CNN is a class of ANNs that has become dominant in various CV tasks <ref type="bibr" target="#b3">[4]</ref>, due to its ability to extract relevant features from raw data <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">CNN and VGG architectures</head><p>In general, the CNN architecture is like an ordinary Neural Network, but it is stronger and deeper because it preserves the spatial information of images to overcome the problem of affine transformations. It also makes the classifier more robust by adding a stack of convolution layers just before the dense layers, besides it reduces the number of trained parameters which speeds up the learning process. CNN architecture includes several building blocks, such as convolution layers, pooling layers, and fully connected layers. A typical architecture consists of repetitions of a stack of several convolution layers and a pooling layer, followed by one or more fully connected layers <ref type="bibr" target="#b3">[4]</ref>. Figure <ref type="figure" target="#fig_0">1</ref> shows a general overview of the CNN architecture. Convolutions layers take the raw image as an input, perform convolutions using different sized trainable sliding windows which are typically named kernels and produce a vector which goes as an input for the dense layers. Each kernel has its own parameters which are trained just like the dense layer parameters, the output of convolutions layer goes as input to the next layer which looks for a higher level of input details and so on. The pooling layers come after a stack of one or more convolution layers, the purpose of pooling is to reduce the input size and overcome the small translations, there are multiple types of polling like Average, Min and Max polling.</p><p>The Visual Geometry Group (VGG) network was introduced by Simonyan and Zisserman <ref type="bibr" target="#b5">[6]</ref> and is, in general, characterized by its simplicity since its only using 3 × 3 convolution layers on top of each other with increasing depth. In order to reduce the volume size or resolution, max-pooling was used in this network. After the convolution layers, there are two dense layers with 4,096 neurons each, followed by a softmax classifier, which is a generalization of the logistic regression to support the multiclass probability distribution. There are two version of VGG, 16 and 19, referring to the number of weight layers in the network.</p><p>Simonyan and Zisserman found the convergence of VGG16 and VGG19 on the deeper networks quite challenging so they trained smaller versions of the model as the one shown in Table <ref type="table" target="#tab_0">1</ref>. The main drawbacks with VGG network it is slow to train and weights are quite large. Due to the depth and the number of fully connected neurons makes it require a large amount of memory which makes it a tedious task. However, in this paper, we suggested methods to overcome this issue and speeding up the training process. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Stratified Sampling</head><p>Stratified sampling is a probability sampling technique that takes the group size into account while doing the sampling process. The elements in target population are divided into distinct groups or so-called "strata", where within each stratum, the elements have similar characteristics to each other <ref type="bibr" target="#b6">[7]</ref>. This technique is used widely in ML especially when the data suffers from class imbalance issue <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>. This sampling technique is implemented in the scikit-learn library which is a free ML library for python. Sampling technique was used while splitting the data into train, validation and test sets using the attribute stratify inside train_test_split function and defining the target variable from which the sample was required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Data Augmentation</head><p>DL models, including CNNs, are usually trained on a large amount of data to have a reasonable performance <ref type="bibr" target="#b11">[12]</ref>, in case of data shortage, like in this paper, these models tend to over-fit training data and lose the generalization ability which leads to bad performance on the test dataset. After the cleaning stage, our dataset contains around 1600 images, training data was 80% of those images, while the reaming 20% were divided equally to testing and validation datasets. Roughly, this amount of data may not be enough to train a deep neural network and produce a good accuracy, thus to increase the accuracy, generalization and prevent over-fitting a data augmentation stage was added to the architecture.</p><p>Data augmentation means to create more training images based on the existing ones by applying some simple effects and affine transformations like shifting, flipping, rotating, zooming and so on. This augmentation will increase the number of training images and leads to more generalization and smoother training curve, it also provides information on small deformations images may contain due to acquisition processes <ref type="bibr" target="#b12">[13]</ref>. Figure <ref type="figure" target="#fig_1">2</ref> shows the result of applying the data augmentation on a the first image resized to 256 × 256 which produced the second and third images by applying rotation and flipping. As possible to see, some important shapes or features for classification that could be discarded if the acquisition was made only with the leaf upright, now also becomes part of training set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Transfer Learning</head><p>Transfer Learning is widely used in machine learning when there is not enough data for model training and the main idea of this technique is to use a pretrained model which was trained on a similar problem, then apply this model on the new problem <ref type="bibr" target="#b13">[14]</ref>. In most cases, the last few layers are refined and a simple dense or a linear model added on top of that.</p><p>ImageNet dataset was used in this paper, which is a large visual dataset designed for object recognition tasks which contains more than 14 million images and have been hand-annotated to indicate what objects are pictured in at least one million of the images, bounding boxes are also provided <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>. ImageNet contains more than 20 thousand categories with typical categories, such as "balloon" or "strawberry", consisting of several hundred images <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Research Methods</head><p>All strategies were implemented on Google Colab cloud service using Tensorflow 2.0 GPU and Keras API abstraction framework. Tensorflow is one of the famous libraries that is commonly used for image classification in DL. Tensorflow is an end-to-end open source software ML platform developed by the Google in 2015 for numerical processing and computation. Keras is an open source neuralnetwork library written in python, with the main purpose of simplify code complexity, it also offers a simple/efficient API able to run on top of Tensorflow, Theano and other DL frameworks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dataset creation</head><p>In this study, images were collected by our department from the fields of Hungary in the summer of 2019. This study has an industrial background in the wine production and the purpose is to predict the type of wine produced by each vine. Around 2200 images were collected by different people and devices which produced images with different sizes, formats and background, so filtering and preparation stage was needed. The dataset is divided into five classes, each class is named in Hungarian after the wine produced from the tree as "Cabernet Franc", "Kékfrankos", "Sárgamuskotály", "Szürkebarát", and "Tramini". Figure <ref type="figure" target="#fig_2">3</ref> shows eight random samples from dataset with their original sizes.</p><p>The two main problems faced and discussed in this study are data shortage and class imbalance, and both of them can be seen from histogram presented in Figure <ref type="figure" target="#fig_3">4</ref>, which shows how many images there are in the dataset for each class.  Since data were collected by non experts and this is the first time using it, the first step was to clean this dataset by removing noisy images, as shown in Figure <ref type="figure" target="#fig_4">5</ref>, so it would not affect the training process in a small dataset, while Figure <ref type="figure" target="#fig_5">6</ref> shows the distribution of the cleaned dataset. Then all the different images format were unified into a common format (PNG), which was selected to keep as much information as possible in the images since its uses a lossless compression algorithm. After that, the images   As it is noticeable from histogram, the dataset is relatively small, especially for deep learning models and the data suffer from the imbalance classes issue. So, in order to tackle these issues, data was split into training, validation and testing sets using stratified sampling, which takes samples from each class proportional to the class size <ref type="bibr" target="#b6">[7]</ref>.</p><p>The split used in the experiments was 80%-10%-10% for the training, validation (which is used for hyperparameters tuning) and testing sets respectively. We used this split because the data is relatively small and we incorporate the stratified sampling which took the samples proportional to the class size for better generalization. After splitting, the data was normalized using MinMax scaler in order to speed up the training process by making the objective function more round, smooth and easy to optimize <ref type="bibr" target="#b17">[18]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Simple CNN Model</head><p>This architecture was built by trial and error starting from a straightforward model inspired by LetNet-5 <ref type="bibr" target="#b18">[19]</ref> architecture.</p><p>The first model consisted of two sets of one convolution and one pooling layers followed by two dense layers, but it showed bad accuracy due to under-fitting. So, layers were added, one layer per experiment, until no improvement was detected.</p><p>Then, multiple experiment were made by trying different combinations of kernel sizes, hidden layers sizes and pooling types. The best accuracy-wise model based on the two classes classification performance as the following:</p><p>• Three convolution blocks with 4, 8, and 16 filters.</p><p>• Each block consists of two convolutional layers followed by a Max pooling layer.</p><p>• Stack of three dense layers of 64, 32 and 5 units each.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">VGG</head><p>Like the simple model, some attempts have been made for a better starting point. In the case of the VGG model, the Transfer Learning technique using the ImageNet dataset was the very first step and, from different experiments, it was noticeable that training only the last few layers of VGG model would provide the best results. The reason for this behavior is that, in CNNs, the first few layers capture the low-level features which in most cases are useful in image classification issue. However, the last few layers are capturing the high-level features which are, in most cases, dataset (problem) specific. At the top of the model, the 1000 classes were removed which are related to ImageNet dataset and added the last dense 5classes layer. Adam optimizer with 0.001 learning rate was also used.</p><p>The other technique used to handle the class imbalance issue was data augmentation on the training set. For reproducibility purposes, random seed was set while splitting the data into training, validation and test sets and the model weights with the lowest validation loss was saved using HD5 format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>For the simple CNN model, the best result obtained among all experiments was 90%, 90% and 90% for Accuracy, Precision and Recall, respectively on the pair of classes "Szürkebarát" and "Tramini". This method of training was chosen to start as it is not time consuming and gives us the ability to do more trials. Also, this way enables the division of the five classes dataset into multiple two classes datasets and monitor the model performance among them.</p><p>Overfitting is noticeble from Figure <ref type="figure" target="#fig_8">8</ref>, but at this point there was no need to seek improvement since two classes • Increased the number of epochs to 300.</p><p>• Every 50 epochs, the train and validation datasets were merged and split randomly again to train and validation datasets.</p><p>• While training, the model was saved from the epoch with best validation accuracy. At the end, it was compared with the final model based on the test accuracy.</p><p>Among all the experiments with four classes, the best result were 88.4%, 88.4% and 88.1% for Accuracy, Precision and Recall, respectively. Figure <ref type="figure" target="#fig_10">9</ref>   Finally, the model was trained with five classes and the best results among all experiments where 83.8%, 84.4% and 84% for Accuracy, Precision and Recall.</p><p>Figure <ref type="figure" target="#fig_11">10</ref> shows the same information as Figure <ref type="figure" target="#fig_10">9</ref> while training the model with all five available classes using the simple model.</p><p>While for the VGG model, some transformations (width shift, height shift, zooming, shearing and rotation) were used in Data Augmentation, which led the model to   <ref type="table" target="#tab_1">2</ref> shows the Precision, Recall, and F1-score using the VGG model. The metrics used to measure the model's performance were chosen considering they take into account the class imbalance issue and the general intuition behind them, that precision means how much noisy data is provided, in other words, it is more related to False Positive rates, while recall means how much good data is missed, and finally the f1-score is the harmonic mean of precision and recall. The main reason that harmonic mean used in f1-score is to punish the large difference between precision and recall. For example, if there were 100% precision and 0% recall, the f1-score will be 0%, while the arithmetic mean would be 50%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>In this research, we investigated different deep learning techniques to overcome data shortage and class imbalance issues. With experiments, we noticed that even the deep leaning models which require a lot of data can be performed very well even on a small imbalanced dataset using techniques such as stratify sampling, data augmentation, and transfer learning. In our first experiment, which is using a simple CNN model we got an accuracy around 83.8% and almost the same for other metrics (Precision, Recall, and F1-score), while in the second experiment a VGG model was used with a combination of different techniques reaching very good results of about 87% for the accuracy and other metrics.</p><p>Results indicate that even if a large amount of data is preferable, it is possible to overcome the previously mentioned issues with satisfactory results. In addition, the applied techniques contributed to non-appearance of overfitting, making the models not database dependent.</p><p>It is also possible to realize that, in cases where the required level of accuracy is very high, above 90% or 95%, the techniques applied may not be recommended without further database analysis, since these techniques may sacrifice accuracy to avoid other problems.</p><p>Also important to notice that one of the models is already known in literature and the other did not required any major framework to be built, only applying systematic and incremental analysis while interpreting obtained results during each step.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of the CNN architecture.</figDesc><graphic coords="2,56.69,290.11,231.01,68.22" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example of Data Augmentation after Resizing the Original Image to 256 × 256.</figDesc><graphic coords="3,56.69,324.75,231.04,283.47" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Random samples from the Dataset with their Original Sizes.</figDesc><graphic coords="4,56.69,80.50,231.02,114.98" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Histogram of the Raw Dataset.</figDesc><graphic coords="4,56.69,238.49,231.02,167.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Example of a Noisy Image.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Histogram of the cleaned Dataset.were resized into two resolutions 224 × 224 and 256 × 256 pixels which are practically preferred by different CNN architectures such as VGG16 and ResNet34. In order to speed up the training process, the raw images were converted into NumPy which is a vectorized implementation. Figure7shows an image sample from cleaned dataset.</figDesc><graphic coords="4,307.56,339.34,231.03,173.27" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7</head><label>7</label><figDesc>shows an image sample from cleaned dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Example from the Prepared Dataset Resized to 256 × 256.</figDesc><graphic coords="4,56.69,508.47,231.03,170.59" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Model Performance in Two Classes. classification was not the intended classification, a robust model was rather interesting. While verifying the model in four classes, two problems were faced, huge over-fitting and the largest class tend to have a large number of False Positives which leads to bad Precision and Recall. At this point, some steps were taken to smooth the effects of the problems:</figDesc><graphic coords="5,307.56,80.50,231.02,114.86" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head></head><label></label><figDesc>brings the performance of the model while training with four classes, based on training and validation accuracy and loss through the epochs.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Model Performance in Four Classes.</figDesc><graphic coords="5,307.56,504.40,231.02,116.95" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: Model Performance in Five Classes.achieve almost 87% accuracy on the test set, which served as an unbiased estimator. Precision, Recall and F1-score also reached about the same value. Class Precision Recall f1-score 0 0.89 0.93 0.91 1 0.82 0.90 0.86 2 0.92 0.79 0.85 3 0.93 0.84 0.88 4 0.80 0.86 0.83</figDesc><graphic coords="6,56.69,80.50,231.02,115.31" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>VGG architecture    </figDesc><table><row><cell cols="2">Convolution network configuration</cell></row><row><cell cols="2">11 weights layer 16 weights layer</cell></row><row><cell cols="2">Input (224 × 224) RGB image</cell></row><row><cell>Conv3-64</cell><cell>Conv3-64</cell></row><row><cell></cell><cell>Conv3-64</cell></row><row><cell cols="2">Max pooling</cell></row><row><cell>Conv3-128</cell><cell>Conv3-128</cell></row><row><cell></cell><cell>Conv3-128</cell></row><row><cell cols="2">Max pooling</cell></row><row><cell>Conv3-256</cell><cell>Conv3-256</cell></row><row><cell>Conv3-256</cell><cell>Conv3-256</cell></row><row><cell></cell><cell>Conv1-256</cell></row><row><cell cols="2">Max pooling</cell></row><row><cell>Conv3-512</cell><cell>Conv3-512</cell></row><row><cell>Conv3-512</cell><cell>Conv3-512</cell></row><row><cell></cell><cell>Conv1-512</cell></row><row><cell cols="2">Max pooling</cell></row><row><cell>Conv3-512</cell><cell>Conv3-512</cell></row><row><cell>Conv3-512</cell><cell>Conv3-512</cell></row><row><cell></cell><cell>Conv1-512</cell></row><row><cell cols="2">Max pooling</cell></row><row><cell cols="2">FC-4096</cell></row><row><cell cols="2">FC-4096</cell></row><row><cell cols="2">FC-1000</cell></row><row><cell cols="2">SoftMax layer</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Precision, Recall and F1-score of the model Table</figDesc><table><row><cell cols="4">Class Precision Recall f1-score</cell></row><row><cell>0</cell><cell>0.89</cell><cell>0.93</cell><cell>0.91</cell></row><row><cell>1</cell><cell>0.82</cell><cell>0.90</cell><cell>0.86</cell></row><row><cell>2</cell><cell>0.92</cell><cell>0.79</cell><cell>0.85</cell></row><row><cell>3</cell><cell>0.93</cell><cell>0.84</cell><cell>0.88</cell></row><row><cell>4</cell><cell>0.80</cell><cell>0.86</cell><cell>0.83</cell></row><row><cell>accuracy</cell><cell></cell><cell></cell><cell>0.87</cell></row><row><cell>macro avg</cell><cell>0.87</cell><cell>0.86</cell><cell>0.87</cell></row><row><cell>weighted avg</cell><cell>0.87</cell><cell>0.87</cell><cell>0.87</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgement</head><p>We would like to thank Telekom who has us as one of its technology partners on Telekom Innovation Laboratories and the Tempus Public Foundation for the financial support through the Stipendium Hungaricum Scholarship Programme.</p><p>The research has been supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.2-16-2017-00013, Thematic Fundamental Research Collaborations Grounding Innovation in Informatics and Infocommunications).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">On definition of deep learning</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Gupta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">World Automation Congress (WAC)</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Deep learning: definition and perspectives for thoracic imaging</title>
		<author>
			<persName><forename type="first">Guillaume</forename><surname>Chassagnon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>Vakalopolou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikos</forename><surname>Paragios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marie-Pierre</forename><surname>Revel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European Radiology</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="2021" to="2030" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Conceptual understanding of convolutional neural network-a deep learning approach</title>
		<author>
			<persName><forename type="first">Sakshi</forename><surname>Indolia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anil</forename><surname>Kumar Goswami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pooja</forename><surname>Asopa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Computational Intelligence and Data Science</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">132</biblScope>
			<biblScope unit="page" from="679" to="688" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Convolutional neural networks: an overview and application in radiology</title>
		<author>
			<persName><forename type="first">Rikiya</forename><surname>Yamashita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mizuho</forename><surname>Nishio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaori</forename><surname>Togashi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Insights into Imaging</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="6" to="2018" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">A General Introduction to Data Analytics</title>
		<author>
			<persName><forename type="first">J</forename><surname>Moreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Carvalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Horvath</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>Wiley</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Very deep convolutional networks for large-scale image recognition</title>
		<author>
			<persName><forename type="first">Karen</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.1556v6</idno>
	</analytic>
	<monogr>
		<title level="j">ICLR</title>
		<imprint>
			<biblScope unit="page">10</biblScope>
			<date type="published" when="2015-04">2015. Apr 2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Van</surname></persName>
		</author>
		<author>
			<persName><surname>Parsons</surname></persName>
		</author>
		<title level="m">Stratified Sampling</title>
				<imprint>
			<publisher>American Cancer Society</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments</title>
		<author>
			<persName><forename type="first">Elizabeth</forename><surname>Tipton</surname></persName>
		</author>
		<idno type="PMID">24647924</idno>
	</analytic>
	<monogr>
		<title level="j">Evaluation Review</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="109" to="139" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Stratified sampling meets machine learning</title>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Lang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edo</forename><surname>Liberty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Konstantin</forename><surname>Shmakov</surname></persName>
		</author>
		<idno>.org</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 33rd International Conference on International Conference on Machine Learning -Volume 48</title>
				<meeting>the 33rd International Conference on International Conference on Machine Learning -Volume 48</meeting>
		<imprint>
			<publisher>JMLR</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2320" to="2329" />
		</imprint>
	</monogr>
	<note>ICML&apos;16</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Semi-supervised learning for semantic relation classification using stratified sampling strategy</title>
		<author>
			<persName><forename type="first">Longhua</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guodong</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fang</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Qiaoming</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2009 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Singapore, August</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1437" to="1445" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Uchida S Goldstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS ONE</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">e0152173</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Improving deep learning using generic data augmentation</title>
		<author>
			<persName><forename type="first">Luke</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geoff</forename><surname>Nitschke</surname></persName>
		</author>
		<idno>CoRR, abs/1708.06020</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">The effectiveness of data augmentation in image classification using deep learning</title>
		<author>
			<persName><forename type="first">Luis</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Wang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A survey of transfer learning</title>
		<author>
			<persName><forename type="first">Karl</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Taghi</forename><forename type="middle">M</forename><surname>Khoshgoftaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dingding</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Big Data</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2016-05">May 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">New computer vision challenge wants to teach robots to see in 3D</title>
	</analytic>
	<monogr>
		<title level="j">New Scientist</title>
		<imprint>
			<date type="published" when="2017-04-07">7 April 2017. 3 February 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">For Web Images, Creating New Technology to Seek and Find</title>
		<author>
			<persName><forename type="first">John</forename><surname>Markoff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The New York Times</title>
		<imprint>
			<date type="published" when="2018-02-03">3 February 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Imagenet large scale visual recognition challenge</title>
		<author>
			<persName><forename type="first">Olga</forename><surname>Russakovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jia</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hao</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><surname>Krause</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanjeev</forename><surname>Satheesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sean</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhiheng</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrej</forename><surname>Karpathy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aditya</forename><surname>Khosla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Bernstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of computer vision</title>
		<imprint>
			<biblScope unit="volume">115</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="211" to="252" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Big data preprocessing: methods and prospects</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ramírez-Gallego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luengo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Big Data Anal</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Gradient-based learning applied to document recognition</title>
		<author>
			<persName><forename type="first">Yann</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Léon</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Haffner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE</title>
				<meeting>the IEEE</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="2278" to="2324" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
