<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classification of a Small Imbalanced Dataset of Vine Leaves Images using Deep Learning Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amjad Balawi</string-name>
          <email>amjad.balawi20@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdullah Al Zoabi</string-name>
          <email>abdullah.al.zoabi@outlook.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Luis Seixas Junior</string-name>
          <email>jlseixasjr@inf.elte.hu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomáš Horváth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Data Science and Engineering ELTE - Eötvös Loránd University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Convolutional Neural Network (CNN) has become one of the most popular techniques in image classification. Usually CNN models are trained on a large amount of data, but in this paper, it is discussed CNN usage on data shortage and class imbalance issues. The study is conducted on a small dataset of vine leaves images on a classification task with five classes using two different approaches. In the first approach, a simple CNN model is used, while in the second approach, the Visual Geometry Group (VGG) model with transfer learning is used. It is shown that using different deep learning techniques such as transfer learning, stratified sampling, data augmentation, and the state of arts CNN models such as VGG gives a relatively very good model performance with up to 87% accuracy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Deep Learning (DL) was inspired by the human brain and
try to simulate how humans learn. In DL, networks of
neurons organized in multiple layers analyze large amounts of
data to find the underlying structure or pattern, the main
idea is to do that automatically without explicitly
programming it, the computer learns how to classify text, sounds
and images. In Computer Vision (CV) tasks, the computer
is trained on huge amount of images by encoding these
images pixels into internal representation, so the classifier
can find the patterns on the input images [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        DL outperforms other solutions in multiple domains,
including speech, vision, video and natural language
processing, it also reduces the use of feature engineering stage
which is one of the most time-consuming tasks in machine
learning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The other reason, that made DL so famous in
the last few years, is a huge improvement in terms of
computational power that can be utilized to accomplish such
tasks. However, one common problem is to preform badly
on unseen data (test dataset), due to over-fitting, usually,
a large dataset is required to increase the model
performance. Another problem is that it is hard to choose the
right model for any given problem.
      </p>
      <p>
        Convolutional Neural Network (CNN or ConvNets) is
a sort of Neural Network mostly popular in image
classification [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] but it has a fewer number of connections,
which means, a fewer number of model parameters
making it less sensitive to over-fitting. The second reason why
CNN is powerful in computer vision tasks is the
parameter sharing, which means, if the filter is useful on a part
of the image it could be useful on another one.
Furthermore, CNNs preserves the spatial information of the image
which makes the classifier more robust against the affine
transformations like translation and rotation.
      </p>
      <p>In many cases, especially in the current times, image
data scarcity can be dealt by frequent acquisition, but there
are still some situations in which acquisition is not easy or
may not be frequent, as in agriculture, where a plant can
not be created in an hour or a day. There are also cases
where synthetic images creation is far from real world
images, so training any model in this situation would create
good controlled results but would not solve real problems.</p>
      <p>The goal of this article is to find techniques,
procedures or functions that can deal with the problems of using
CNNs in small and imbalanced databases. For such, two
different structures of CNN are implemented, with
combination of different DL techniques and procedures such
as data augmentation, transfer learning, stratified
sampling and model picking based on validation accuracy, also
showing the transition from a simple CNN model to a state
of art model like VGG.</p>
      <p>This paper is organized as follow: Section 2 presents
the techniques and definitions used in the proposals of this
work, followed by Section 3 which describes the steps
for constructing the models. Section 4 shows the results
obtained and in Section 5 the conclusions that can be
inferred.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Approaches</title>
      <p>
        There are many Machine Learning (ML) techniques that
could be used for general classification problems like
KNearest Neighbor (KNN), Logistic Regression, Support
Vector Machines (SVM) and Artificial Neural Networks
(ANN), but in term of the image classification problems
the most popular technique is the Convolution Neural
Networks. CNN is a class of ANNs that has become dominant
in various CV tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], due to its ability to extract relevant
features from raw data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
2.1
      </p>
      <sec id="sec-2-1">
        <title>CNN and VGG architectures</title>
        <p>
          In general, the CNN architecture is like an ordinary Neural
Network, but it is stronger and deeper because it preserves
the spatial information of images to overcome the problem
of affine transformations. It also makes the classifier more
robust by adding a stack of convolution layers just before
the dense layers, besides it reduces the number of trained
parameters which speeds up the learning process. CNN
architecture includes several building blocks, such as
convolution layers, pooling layers, and fully connected layers.
A typical architecture consists of repetitions of a stack of
several convolution layers and a pooling layer, followed
by one or more fully connected layers [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Figure 1 shows a general overview of the CNN
architecture. Convolutions layers take the raw image as an
input, perform convolutions using different sized trainable
sliding windows which are typically named kernels and
produce a vector which goes as an input for the dense
layers. Each kernel has its own parameters which are trained
just like the dense layer parameters, the output of
convolutions layer goes as input to the next layer which looks
for a higher level of input details and so on. The pooling
layers come after a stack of one or more convolution
layers, the purpose of pooling is to reduce the input size and
overcome the small translations, there are multiple types
of polling like Average, Min and Max polling.</p>
        <p>
          The Visual Geometry Group (VGG) network was
introduced by Simonyan and Zisserman [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and is, in general,
characterized by its simplicity since its only using 3 3
convolution layers on top of each other with increasing
depth. In order to reduce the volume size or resolution,
max-pooling was used in this network. After the
convolution layers, there are two dense layers with 4,096 neurons
each, followed by a softmax classifier, which is a
generalization of the logistic regression to support the multiclass
probability distribution. There are two version of VGG,
16 and 19, referring to the number of weight layers in the
network.
        </p>
        <p>
          Simonyan and Zisserman found the convergence of
VGG16 and VGG19 on the deeper networks quite
challenging so they trained smaller versions of the model as
the one shown in Table 1. The main drawbacks with VGG
network it is slow to train and weights are quite large. Due
to the depth and the number of fully connected neurons
makes it require a large amount of memory which makes
it a tedious task. However, in this paper, we suggested
methods to overcome this issue and speeding up the
training process.
Stratified sampling is a probability sampling technique
that takes the group size into account while doing the
sampling process. The elements in target population are
divided into distinct groups or so-called “strata”, where
within each stratum, the elements have similar
characteristics to each other [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This technique is used widely in ML
especially when the data suffers from class imbalance
issue [
          <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
          ]. This sampling technique is implemented
in the scikit-learn library which is a free ML library for
python. Sampling technique was used while splitting the
data into train, validation and test sets using the attribute
stratify inside train_test_split function and defining the
target variable from which the sample was required.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Augmentation</title>
        <p>
          DL models, including CNNs, are usually trained on a large
amount of data to have a reasonable performance [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], in
case of data shortage, like in this paper, these models tend
to over-fit training data and lose the generalization ability
which leads to bad performance on the test dataset.
After the cleaning stage, our dataset contains around 1600
images, training data was 80% of those images, while the
reaming 20% were divided equally to testing and
validation datasets. Roughly, this amount of data may not be
enough to train a deep neural network and produce a good
accuracy, thus to increase the accuracy, generalization and
prevent over-fitting a data augmentation stage was added
to the architecture.
        </p>
        <p>
          Data augmentation means to create more training
images based on the existing ones by applying some simple
effects and affine transformations like shifting, flipping,
rotating, zooming and so on. This augmentation will
increase the number of training images and leads to more
generalization and smoother training curve, it also
provides information on small deformations images may
contain due to acquisition processes [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Figure 2 shows the
result of applying the data augmentation on a the first
image resized to 256 256 which produced the second and
third images by applying rotation and flipping.
        </p>
        <p>As possible to see, some important shapes or features
for classification that could be discarded if the acquisition
was made only with the leaf upright, now also becomes
part of training set.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Transfer Learning</title>
        <p>
          Transfer Learning is widely used in machine learning
when there is not enough data for model training and the
main idea of this technique is to use a pretrained model
which was trained on a similar problem, then apply this
model on the new problem [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In most cases, the last
few layers are refined and a simple dense or a linear model
added on top of that.
        </p>
        <p>
          ImageNet dataset was used in this paper, which is a
large visual dataset designed for object recognition tasks
which contains more than 14 million images and have
been hand-annotated to indicate what objects are pictured
in at least one million of the images, bounding boxes are
also provided [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ]. ImageNet contains more than 20
thousand categories with typical categories, such as
“balloon” or “strawberry”, consisting of several hundred
images [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Research Methods</title>
      <p>All strategies were implemented on Google Colab cloud
service using Tensorflow 2.0 GPU and Keras API
abstraction framework. Tensorflow is one of the famous libraries
that is commonly used for image classification in DL.
Tensorflow is an end-to-end open source software ML
platform developed by the Google in 2015 for numerical
processing and computation. Keras is an open source
neuralnetwork library written in python, with the main purpose
of simplify code complexity, it also offers a
simple/efficient API able to run on top of Tensorflow, Theano and
other DL frameworks.
3.1</p>
      <sec id="sec-3-1">
        <title>Dataset creation</title>
        <p>In this study, images were collected by our department
from the fields of Hungary in the summer of 2019. This
study has an industrial background in the wine
production and the purpose is to predict the type of wine
produced by each vine. Around 2200 images were collected
by different people and devices which produced images
with different sizes, formats and background, so
filtering and preparation stage was needed. The dataset is
divided into five classes, each class is named in
Hungarian after the wine produced from the tree as “Cabernet
Franc”, “Kékfrankos”, “Sárgamuskotály”, “Szürkebarát”,
and “Tramini”. Figure 3 shows eight random samples
from dataset with their original sizes.</p>
        <p>The two main problems faced and discussed in this
study are data shortage and class imbalance, and both of
them can be seen from histogram presented in Figure 4,
which shows how many images there are in the dataset for
each class.</p>
        <p>Since data were collected by non experts and this is the
first time using it, the first step was to clean this dataset by
removing noisy images, as shown in Figure 5, so it would
not affect the training process in a small dataset, while
Figure 6 shows the distribution of the cleaned dataset.</p>
        <p>Then all the different images format were unified into
a common format (PNG), which was selected to keep as
much information as possible in the images since its uses
a lossless compression algorithm. After that, the images
were resized into two resolutions 224 224 and 256 256
pixels which are practically preferred by different CNN
architectures such as VGG16 and ResNet34. In order to
speed up the training process, the raw images were
converted into NumPy which is a vectorized implementation.
Figure 7 shows an image sample from cleaned dataset.</p>
        <p>
          As it is noticeable from histogram, the dataset is
relatively small, especially for deep learning models and the
data suffer from the imbalance classes issue. So, in order
to tackle these issues, data was split into training,
validation and testing sets using stratified sampling, which takes
samples from each class proportional to the class size [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          The split used in the experiments was 80%-10%-10%
for the training, validation (which is used for
hyperparameters tuning) and testing sets respectively. We used
this split because the data is relatively small and we
incorporate the stratified sampling which took the samples
proportional to the class size for better generalization.
After splitting, the data was normalized using MinMax scaler
in order to speed up the training process by making the
objective function more round, smooth and easy to optimize
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
This architecture was built by trial and error starting from
a straightforward model inspired by LetNet-5 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
architecture.
        </p>
        <p>The first model consisted of two sets of one
convolution and one pooling layers followed by two dense layers,
but it showed bad accuracy due to under-fitting. So, layers
were added, one layer per experiment, until no
improvement was detected.</p>
        <p>Then, multiple experiment were made by trying
different combinations of kernel sizes, hidden layers sizes and
pooling types. The best accuracy-wise model based on the
two classes classification performance as the following:</p>
        <sec id="sec-3-1-1">
          <title>Three convolution blocks with 4, 8, and 16 filters. Each block consists of two convolutional layers followed by a Max pooling layer. Stack of three dense layers of 64, 32 and 5 units each.</title>
          <p>3.3</p>
          <p>VGG
Like the simple model, some attempts have been made for
a better starting point. In the case of the VGG model, the
Transfer Learning technique using the ImageNet dataset
was the very first step and, from different experiments,
it was noticeable that training only the last few layers of
VGG model would provide the best results.</p>
          <p>The reason for this behavior is that, in CNNs, the first
few layers capture the low-level features which in most
cases are useful in image classification issue. However, the
last few layers are capturing the high-level features which
are, in most cases, dataset (problem) specific. At the top
of the model, the 1000 classes were removed which are
related to ImageNet dataset and added the last dense
5classes layer. Adam optimizer with 0.001 learning rate
was also used.</p>
          <p>The other technique used to handle the class imbalance
issue was data augmentation on the training set. For
reproducibility purposes, random seed was set while
splitting the data into training, validation and test sets and the
model weights with the lowest validation loss was saved
using HD5 format.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>For the simple CNN model, the best result obtained among
all experiments was 90%, 90% and 90% for Accuracy,
Precision and Recall, respectively on the pair of classes
“Szürkebarát” and “Tramini”. This method of training was
chosen to start as it is not time consuming and gives us the
ability to do more trials. Also, this way enables the
division of the five classes dataset into multiple two classes
datasets and monitor the model performance among them.</p>
      <p>Overfitting is noticeble from Figure 8, but at this point
there was no need to seek improvement since two classes
classification was not the intended classification, a robust
model was rather interesting. While verifying the model
in four classes, two problems were faced, huge over-fitting
and the largest class tend to have a large number of False
Positives which leads to bad Precision and Recall. At this
point, some steps were taken to smooth the effects of the
problems:</p>
      <sec id="sec-4-1">
        <title>Increased the number of epochs to 300. Every 50 epochs, the train and validation datasets were merged and split randomly again to train and validation datasets.</title>
        <p>While training, the model was saved from the epoch
with best validation accuracy. At the end, it was
compared with the final model based on the test accuracy.</p>
        <p>Among all the experiments with four classes, the best
result were 88.4%, 88.4% and 88.1% for Accuracy,
Precision and Recall, respectively. Figure 9 brings the
performance of the model while training with four classes, based
on training and validation accuracy and loss through the
epochs.</p>
        <p>Finally, the model was trained with five classes and the
best results among all experiments where 83.8%, 84.4%
and 84% for Accuracy, Precision and Recall.</p>
        <p>Figure 10 shows the same information as Figure 9 while
training the model with all five available classes using the
simple model.</p>
        <p>While for the VGG model, some transformations (width
shift, height shift, zooming, shearing and rotation) were
used in Data Augmentation, which led the model to
Table 2 shows the Precision, Recall, and F1-score
using the VGG model. The metrics used to measure the
model’s performance were chosen considering they take
into account the class imbalance issue and the general
intuition behind them, that precision means how much noisy
data is provided, in other words, it is more related to False
Positive rates, while recall means how much good data is
missed, and finally the f1-score is the harmonic mean of
precision and recall. The main reason that harmonic mean
used in f1-score is to punish the large difference between
precision and recall. For example, if there were 100%
precision and 0% recall, the f1-score will be 0%, while the
arithmetic mean would be 50%.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this research, we investigated different deep learning
techniques to overcome data shortage and class imbalance
issues. With experiments, we noticed that even the deep
leaning models which require a lot of data can be
performed very well even on a small imbalanced dataset using
techniques such as stratify sampling, data augmentation,
and transfer learning. In our first experiment, which is
using a simple CNN model we got an accuracy around 83.8%
and almost the same for other metrics (Precision, Recall,
and F1-score), while in the second experiment a VGG
model was used with a combination of different techniques
reaching very good results of about 87% for the accuracy
and other metrics.</p>
      <p>Results indicate that even if a large amount of data is
preferable, it is possible to overcome the previously
mentioned issues with satisfactory results. In addition, the
applied techniques contributed to non-appearance of
overfitting, making the models not database dependent.</p>
      <p>It is also possible to realize that, in cases where the
required level of accuracy is very high, above 90% or 95%,
the techniques applied may not be recommended without
further database analysis, since these techniques may
sacrifice accuracy to avoid other problems.</p>
      <p>Also important to notice that one of the models is
already known in literature and the other did not required
any major framework to be built, only applying
systematic and incremental analysis while interpreting obtained
results during each step.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>We would like to thank Telekom who has us as one of
its technology partners on Telekom Innovation
Laboratories and the Tempus Public Foundation for the financial
support through the Stipendium Hungaricum Scholarship
Programme.</p>
      <p>The research has been supported by the European
Union, co-financed by the European Social Fund
(EFOP3.6.2-16-2017-00013, Thematic Fundamental Research
Collaborations Grounding Innovation in Informatics and
Infocommunications).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Yang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ji</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Gupta</surname>
          </string-name>
          .
          <article-title>On definition of deep learning</article-title>
          .
          <source>In 2018 World Automation Congress (WAC)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Guillaume</given-names>
            <surname>Chassagnon</surname>
          </string-name>
          , Maria Vakalopolou, Nikos Paragios, and
          <string-name>
            <surname>Marie-Pierre Revel</surname>
          </string-name>
          .
          <article-title>Deep learning: definition and perspectives for thoracic imaging</article-title>
          .
          <source>European Radiology</source>
          ,
          <volume>30</volume>
          :
          <fpage>2021</fpage>
          -
          <lpage>2030</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sakshi</given-names>
            <surname>Indolia</surname>
          </string-name>
          , Anil Kumar Goswami,
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Mishra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Pooja</given-names>
            <surname>Asopa</surname>
          </string-name>
          .
          <article-title>Conceptual understanding of convolutional neural network- a deep learning approach</article-title>
          .
          <source>Procedia Computer Science</source>
          ,
          <volume>132</volume>
          :
          <fpage>679</fpage>
          -
          <lpage>688</lpage>
          ,
          <year>2018</year>
          .
          <source>International Conference on Computational Intelligence and Data Science.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Rikiya</given-names>
            <surname>Yamashita</surname>
          </string-name>
          , Mizuho Nishio, Richard Do, and
          <string-name>
            <given-names>Kaori</given-names>
            <surname>Togashi</surname>
          </string-name>
          .
          <article-title>Convolutional neural networks: an overview and application in radiology</article-title>
          .
          <source>Insights into Imaging</source>
          ,
          <volume>9</volume>
          ,
          <string-name>
            <surname>06</surname>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Horvath</surname>
          </string-name>
          . A General Introduction to Data Analytics. Wiley,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>arXiv arXiv:1409.1556v6 (ICLR</source>
          <year>2015</year>
          ),
          <issue>10</issue>
          <year>Apr 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Van</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Parsons</surname>
          </string-name>
          .
          <source>Stratified Sampling</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . American Cancer Society,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Elizabeth</given-names>
            <surname>Tipton</surname>
          </string-name>
          .
          <article-title>Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments</article-title>
          .
          <source>Evaluation Review</source>
          ,
          <volume>37</volume>
          (
          <issue>2</issue>
          ):
          <fpage>109</fpage>
          -
          <lpage>139</lpage>
          ,
          <year>2013</year>
          . PMID:
          <volume>24647924</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Lang</surname>
          </string-name>
          , Edo Liberty, and
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Shmakov</surname>
          </string-name>
          .
          <article-title>Stratified sampling meets machine learning</article-title>
          .
          <source>In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48</source>
          , ICML'
          <volume>16</volume>
          , page 2320-
          <fpage>2329</fpage>
          . JMLR.org,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Longhua</surname>
            <given-names>Qian</given-names>
          </string-name>
          , Guodong Zhou, Fang Kong, and
          <string-name>
            <given-names>Qiaoming</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>Semi-supervised learning for semantic relation classification using stratified sampling strategy</article-title>
          .
          <source>In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1437</fpage>
          -
          <lpage>1445</lpage>
          , Singapore,
          <year>August 2009</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Uchida S Goldstein M.</surname>
          </string-name>
          <article-title>A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data</article-title>
          .
          <source>PLoS ONE</source>
          <volume>11</volume>
          (
          <issue>4</issue>
          ): e0152173,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Luke</given-names>
            <surname>Taylor</surname>
          </string-name>
          and Geoff Nitschke.
          <article-title>Improving deep learning using generic data augmentation</article-title>
          .
          <source>CoRR, abs/1708.06020</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Luis</given-names>
            <surname>Perez</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>The effectiveness of data augmentation in image classification using deep learning</article-title>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Karl</surname>
            <given-names>Weiss</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taghi M. Khoshgoftaar</surname>
            , and
            <given-names>DingDing</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>A survey of transfer learning</article-title>
          .
          <source>Journal of Big Data</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):9,
          <string-name>
            <surname>May</surname>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <article-title>New computer vision challenge wants to teach robots to see in 3D</article-title>
          .
          <source>New Scientist, 7 April 2017. Retrieved 3 February</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>John</given-names>
            <surname>Markoff</surname>
          </string-name>
          . For Web Images, Creating New Technology to Seek and Find.
          <source>The New York Times, Retrieved 3 February</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Olga</surname>
            <given-names>Russakovsky</given-names>
          </string-name>
          , Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , et al.
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          .
          <source>International journal of computer vision</source>
          ,
          <volume>115</volume>
          (
          <issue>3</issue>
          ):
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Ramírez-Gallego S. Luengo J</surname>
            . et all García,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Big data preprocessing: methods and prospects</article-title>
          .
          <source>Big Data Anal</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Yann</surname>
            <given-names>Lecun</given-names>
          </string-name>
          , Léon Bottou, Yoshua Bengio, and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Haffner</surname>
          </string-name>
          .
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>In Proceedings of the IEEE</source>
          , pages
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>