=Paper=
{{Paper
|id=Vol-2771/paper50
|storemode=property
|title= Less is More when Applying Transfer Learning to Multi-Spectral Data
|pdfUrl=https://ceur-ws.org/Vol-2771/AICS2020_paper_50.pdf
|volume=Vol-2771
|authors=Yuvraj Sharma,Robert Ross
|dblpUrl=https://dblp.org/rec/conf/aics/SharmaR20
}}
== Less is More when Applying Transfer Learning to Multi-Spectral Data==
<pdf width="1500px">https://ceur-ws.org/Vol-2771/AICS2020_paper_50.pdf</pdf>
<pre>
                                    Less is More when Applying Transfer Learning
                                               to Multi-Spectral Data

                                                        Yuvraj Sharma1 and Robert Ross2
                                              1
                                              Adapt Centre, Technological University Dublin, Ireland
                                                           d18129636@mytudublin.com
                                2
                                  Adapt Centre, Technological University Dublin, Ireland robert.ross@tudublin.ie


                                       Abstract. Transfer Learning is widely recognized as providing incred-
                                       ible benefits to many image processing tasks. But not all problems in
                                       the computer vision field are driven by traditional Red, Green, and Blue
                                       (RGB) imagery as tend to be assumed by most large pre-trained mod-
                                       els for Transfer Learning. Satellite based remote sensing applications for
                                       example typically use multispectral bands of light. While transferring
                                       RGB features to this non-RGB domain has been shown to generally
                                       give higher accuracy than training from scratch, the question remains
                                       whether a more suitable fine tuned method can be found. Given this
                                       challenge, this paper presents a study in multispectral image analysis
                                       using multiple methods to achieve feature transfer. Specifically, we train
                                       and compare two pre-trained models based on the Resnet50 architec-
                                       ture and apply them to a multi-spectral image processing task. The key
                                       difference between the two models is that one was pre-trained in a con-
                                       ventional way against RGB data, while the other was trained against a
                                       single band greyscale variant of the same data. Our results demonstrate
                                       an improved performance on the greyscale pre-trained model relative to
                                       the more traditional RGB model.

                                       Keywords: Deep learning · Transfer learning · Image Analysis · Resnet
                                       · CNN · Multispectral images · ImageNet · Satellite imagery · EuroSat.


                                1    Introduction

                                The rapid advancement of artificial intelligence (AI) in the computer vision field
                                has increased the demand of large-scale labelled data. AI has enabled organi-
                                sations to look towards Earth Observation (EO) or Remote Sensing (RS) for
                                collecting information on buildings, natural structures, urban and rural bound-
                                aries, as well as for prediction and estimation of natural calamities, forest fires,
                                melting glaciers, vanishing forest covers, and monitoring humanitarian crisis.
                                Satellite image classification has many challenges too, like high variability, small
                                size of labelled datasets, low spatial resolution, and the multispectral nature of
                                images. Normalization of satellite images is also not easy, mainly due to the
                                presence of clouds in EO images, or due to prevailing weather conditions, or due
                                to the changes in lighting of an area at different times during a year. All these


Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2      Y. Sharma & R. Ross.

issues make creation of a large labelled EO dataset very difficult. EuroSat and
SpaceNet [1] datasets have tried solving the problem of small labelled datasets.
Other approaches are, unsupervised feature extraction from an image [2], us-
ing large RGB-trained networks like VGG16 for transfer learning [3], and lastly
training Convolutional Neural Network (CNN) from scratch over small available
labelled data.
    Attempts have been made to transfer features learned in ImageNet classifi-
cation problems to the smaller target satellite imaging domains. There are some
key differences between natural image domain of ImageNet and RS domains,
like, objects are very small in satellite imagery, and moreover these images are
multispectral in nature meaning an image has multiple frequency channels sum-
marized in it. A lot of information is stored in an RS image than a typical
RGB image. Since these two domains are primarily of different nature, accura-
cies achieved are either not in very high ranges or results are not reproducible.
Arguably the primary reason for this is that images taken by satellite are mul-
tispectral in nature, meaning they have multiple bands representing an image,
other than just RGB or visible bands.
    Doing transfer learning, using a model pre-trained on RGB (coloured) im-
ages, is arguably not the right approach when your target dataset consists of
multispectral or images with multiple bands. State of the art solutions trans-
fer ImageNet RGB features to multispectral domains, even for single channel
grey-scale domains like medical Imaging [4]. Multispectral images and natural
images are extremely different to one another, so any meaningful transfer is
highly doubtful. It is also observed that the usefulness of a pre-trained network
increasingly decreases as the task the network is trained on moves away from
the target task [5].
    Given the above, this work hypothesizes that a large CNN trained on single
channel images, can learn more relevant features for multispectral image analysis,
than the one trained on coloured images. This might seem counter-intuitive at
first a pre-trained RGB model would be assumed to learn more features than
a similarly structured model that is trained on only one channel. However, we
argue that the fact that 3 particular channels and their colour based inter-
dependencies introduce a bias in the pre-trained model that is not met by the
multi-spectral target domain where multiple different channels are available.
While the ideal case would be to have a pre-trained model on full-multispectral
information that is the exact same number of channels as the target domain, this
is not yet a reality. We argue that a compromise is to apply a pre-trained model
to each individual channel in a multi-spectral model, but to make sure that
the pre-trained model is optimised to provide useful features for single channel
analysis.
    To be more concrete, this research approaches the problem of classifying a
multispectral image, by firstly training a large network, Resnet50, on a single
channel image dataset, and then use this large network to transfer features to the
target domain of multispectral image classification. We compare this approach
to a more traditional configuration where the Resnet50 model is pre-trained with
      Less is More when Applying Transfer Learning to Multi-Spectral Data          3

RGB data. We proceed with a short review of relevant literature before detailing
our experimental methodology and results.


2   Literature Review
CNNs are the default in handling image data. ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) is an annual image classification and detection
competition on large natural image database [6]. ResNet [7] has won ILSVRC
in 2015. ResNet was truly deep network with 152 layers in it and is one of the
fastest, most widely used and accepted ImageNet trained model. ResNet handled
problems of vanishing and exploding gradient, which crops up as a network grows
deeper by using Residual Blocks in the network which are combination of conv-
relu-conv series and allowing skip connections between these blocks. During back
propagation, gradient flows easily through the network without getting lost or
very weak.
     If the dataset is smaller, we cannot simply apply bigger and bigger networks
as was the case for the ImageNet challenge; instead, we need to apply another
technique. One such way to get good accuracy is called the Transfer learning
( [8–11]). Large neural networks trained on large image datasets like ImageNet
have shown that network’s first few layers learns features similar to Gabor fil-
ters. These initial layers carry information like what are the location of edges,
boundaries, corners, and shapes present inside an image.
     These features or information contained in the initial layers are not specific
for any task. They are general in nature and are applicable to all sorts of target
images and tasks [5]. In this landmark paper, authors established some of the
key concepts of transfer learning like – features transition from general to spe-
cific from first few layers to final few layers, features are not highly transferable
to a distant target task, and lastly any kind of transfer is better than random
initialization even when the source and target tasks are not that similar. Gen-
erally, the target task is much smaller than the source task. Transfer learning
is of two types – One in which pretrained features or the features learned from
the source are not touched or are frozen and only new layers are trained on the
smaller target task, second in which some or all source layers are allowed to train
on target task or their learned weights are allowed to be fine-tuned.
     Models pretrained on ImageNet have given great results in transfer learning,
both in supervised and unsupervised domains ( [12–14]), these papers further
established that large CNN networks have the intrinsic capabilities of learning
general transferable features. Further efforts have been put in to understand
how neural networks are able to generalize so well and how to make them more
robust ( [15, 16]).
     Small labelled datasets can also practice data augmentation to increase the
dataset size and thus improve the model fitting over the data [17]. This problem
is especially common in a multispectral domain like satellite imaging. Data Aug-
mentation also helps in reducing overfitting. It helps in increasing the dataset
size by either warping or oversampling the data while making sure that labels
4      Y. Sharma & R. Ross.

are preserved. General form of image augmentation includes data warping tech-
niques of geometric and color transformations, like obtaining a new image by
cropping, flipping, sheering, or inverting an image [18]. Augmentation in com-
puter vision problems has been happening over the last couple of decades and
was first seen in [19].
    EO or RS has received attention from researchers around the world very re-
cently. The data from these domains has the capability of bringing about signif-
icant improvements in agriculture. Use of RGB and Near-Infrared region images
captured by low-orbit observational systems in estimating produce and mapping
the plantation areas has been advocated as best practises [20]. Researchers have
also used satellite images in detecting the sections of roads and urban areas cov-
ered under flood waters [21], and there are many more applications of satellite
data like these two.
    The problem with these EO datasets is that firstly, there are very few and
very small sized labelled datasets available and secondly, image features in these
datasets are quite different from those from natural image datasets, which have
images like cat, dog, fish, scorpion, car, truck, house, ship and so on. UCMerced,
for examples is a popular dataset of Satellite imaging [26]. It is a fairly small
dataset with 21 land-cover classes, 100 RGB images per class, and 256*256 pixel
dimensions. Likewise, the other datasets in use too have images in the few hun-
dreds for every class label. In a supervised problem-solving approach, the per-
formance of a classifier depends on the size and quality of a suitably labelled
dataset. [12] suggested in their paper that deep networks learn features that can
be treated at par or even better than the traditional manual methods used in
computer vision. Thus, several attempts have been made using pretrained deep
models to learn features in multispectral RS data [22–25]. All these studies have
performed classification upon RS datasets, using ImageNet trained large deep
networks.


3   Methodology
The purpose of this research is to test whether single channel features are better
than RGB-features for models which are trying to learn multispectral data. This
research suggests using greyscale pre-training rather than colour (RGB) to try
to improve classification results on multispectral satellite data.
    To achieve this we assumed a set of RGB images that are combined from mini-
Imagenet and EuroSat datasets. For single channel features, greyscale variants
(single channel) of these images were created. These greyscale images are the
same as the RGB ones, except they are converted to greyscale using image
augmentation methods prior to the training process. Multispectral data consists
of satellite captured multiple band images. The dataset that the study uses
has TIFF images consisting of thirteen spectral bands each. Six single bands are
extracted from these .tiff images. Experiments are conducted using the ResNet50
network, which has a CNN as its main building block.
    There are two sources of RGB and Greyscale images, one is the mini-ImageNet
dataset and another one is the EuroSat dataset. Our design Methodology is
       Less is More when Applying Transfer Learning to Multi-Spectral Data       5

such that first new models are created using Resnet50 architecture by training
them from scratch over datasets from our two color spaces, namely RGB and
Greyscale. Also note that, there are two types of datasets in each category, one
is smaller in size and consists of Non-Augmented images, while the other one is
larger and consists of Augmented images. So, four Resnet50 architecture-based
models were created by training from scratch on RGB and Greyscale datasets
independently. These four pre-trained models are then used to transfer features
or are finetuned on target images of individual bands, i.e. band B02, B03, B04,
B05, B08 band B12. The performance is recorded on test sets and a comparative
analysis is made on the outcomes. This will make it four sets of test accuracies
and F1 scores; two are for RGB based feature transfer and another two for
Greyscale or single channel-based feature transfer.


4     Experimental Details

4.1   Data

The experiments are conducted using multiple sets of images, where each set has
distinctive features and are as per the design of research.
    The first dataset we make use of is mini-Imagenet. ImageNet itself is a large
scale ontologically organised catalogue of images built upon the backbone of
the WordNet structure. ImageNet consists of approximately 3.2 million images
in total [6]. Due to the limitations of time and computational power for this
particular study, a very small subset of this dataset, mini-ImageNet, is used for
the purpose of this study. It has 500 images each in its 100 overall classes, and
each image is of the height 64 pixels and width 64 pixels.
    While mini-ImageNet provides the backbone dataset for pre-training we also
require a target domain dataset. For this purpose we use the EuroSat dataset.
The EuroSat dataset contains two sets of images - namely RGB and Multi-
spectral imagery. There are around twenty-seven thousand images in 10-classes,
collected by the Sentinel 2A satellite. Classes are different land-use types for
example, Residential, Industrial, Farmland, Rivers etc [21].
    Given that mini-ImageNet contains only five hundred images in each class,
we apply data augmentation to increase the size of the dataset to two thousand
images in each class. Data augmentation is done using basic geometric transfor-
mations like shifting the image across its width and height, shearing or tilting
the image along one of the axis, zooming in and out of the images, flipping the
image either horizontally, and lastly by rotating the images by not more than
90 degrees at a time.
    For every augmented and non-augmented image an equivalent greyscale ver-
sion is created. The images in RGB and Greyscale set are exactly same. Here
greyscale is used to represent the idea of single channel and these images will be
used to train Resnet50 based network from scratch, to prepare a single channel
trained classifier. This will later be used for transferring features to target do-
mains of multispectral bands. It should be noted that our greyscale images were
6      Y. Sharma & R. Ross.

not constructed by arbitrarily deciding upon a single band from the input RGB
image, but rather through the application of an RGB to greyscale transform.
   There are ten classes in the EuroSat data, for both RGB as well as Multi-
spectral images. Please refer figure 1 to see the extracted images for all 13 bands
and the sample image.


Fig. 1. Same image extracted as 13 Bands. From left to Right, and top to Bottom
- Band01, Band02, Band03, Band04, Band05, Band06, Band07, Band08, Band09,
Band10, Band01, Band11, Band12, Band13, and lastly the Original RGB Image.


4.2   Implementation
Model Architecture Our backbone model architecture is based a Tensorflow
/ Keras implementation of Resnet50 – a fifty-layer deep neural network. For
the prediction network sequential TensorFlow Keras layers are added to the
backbone network.
    For initial training, Resnet50 is used with random weight initialization, and
its layers are kept as trainable. Also, the top layer is replaced with a fully
connected layer with number of nodes equal to 110 or the number of target
classes (100 ImageNet classes and 10 Eurosat classes). This model is trained
from scratch on the combined dataset of mini-ImageNet and EuroSat, once on
RGB (RGB Model) and then on Greyscale (Greyscale Model) colour spaces. For
the Greyscale model, single channel goes in as input to the three-channel input
of the base model Resnet50. On top of trainable Resnet50 base-model, one fully
connected ReLU layer with 256 activation nodes is added. This layer is followed
and preceded by a Dropout layer with dropout nodes set as 0.5.
    RGB Model and Greyscale Model are used to transfer learned features to
Multispectral feature space separately. A new network is designed and is used in
place of the top layer of the base models A and B for this purpose. This network
consists of one fully connected dense layer with Sigmoid activation function,
       Less is More when Applying Transfer Learning to Multi-Spectral Data             7

followed by a dropout layer and lastly the final output Softmax layer with 10
nodes. Model architecture is same for all the six bands. These bands are chosen
out of given thirteen bands because of two main reasons – they were the top
performers in the original paper as well [21], and some of the bands (B01, B09,
B10 and B11) are not meant for land observation altogether. Thus, the Bands
that will be evaluated in this study are –
Band02 – Blue Color
Band03 – Green Color
Band04 – Red Color
Band05 – Red Edge 1
Band08 – Near Infrared
Band12 – Shortwave Infrared 2


Transfer Learning The premise of Transfer Learning is that our pre-trained
networks (RGB Model and Greyscale Model) contain a rich set of descriptors or
filters. To use the concept of Transfer Learning effectively, features learned from
previous task of training over mini-ImageNet and EuroSat are transferred to
the target task of Multispectral nature in some series of steps. This is achieved
by using “Fine-Tuning” techniques. Learned filters are reused by training the
network in parts. The network’s architecture can be understood easily from the
figure 2. The steps that this research has followed are as follows –


Fig. 2. Final network architecture. Resnet50 is used as a building block for creating the
Base Model. This model has been trained on RGB and Greyscale images separately.
Later on, this Base Model is used for transferring features learned to the target task.
Figure shows Different steps in Fine-Tuning and Feature Transfer.


 – Step 1 – Train only the head of the network or the new network layers that
   have been added to the base-model and keep the rest of the layers as frozen
   or non-trainable. In the figure 2, the section marked as (1) is the new network
8         Y. Sharma & R. Ross.

      added on top of the Base Model. Thus, section marked as (2) and (3) are kept
      as non-trainable. The fully connected layers in section (1) are initialised with
      random weights and trained over the EuroSat single band images extracted
      from the multispectral .tif images. Reasoning – This way only a part of
      the network is being trained at first and the weights correction is not back
      propagated into the entire network. If the whole network is allowed to train
      from scratch on the target data, there is a risk of losing the features and
      filters learned by the fully trained base model. This training is done only
      for a few epochs (number of epochs = 5), so that the final layers can learn
      requisite number of features or patterns on the target data.
    – Step 2 – Train only the non-convolutional layers in the Base Model, and no
      weight update will happen for the Resnet50 layers, i.e. the section (1) and
      section (2) layers as shown in the figure 2. In step 1 and step 2, the network
      is being warmed up for the task at hand.
    – Step 3 – Unfreezing the last residual or convolutional block in the Resnet50
      layers. This is the terminal block in the section (3) of figure 2. The network
      is fine-tuned or trained over the target data for larger epochs.


5      Results
We first present the results of the pre-training process before detailing the specific
results for the target EO data in more detail.

5.1     Base Models
The performance in land-cover classification task is taken as a measure to an-
swer the research question. Two similar models are trained separately on non-
augmented RGB and Greyscale images. The number of training images per class
is low. Two other similar models are trained separately on the larger (four times
size) database of augmented RGB and Greyscale images. For all these four mod-
els, a high training accuracy, in late 90%s, has been achieved by training just
over 100 epochs. However, the validation scores were low due to large network
size, smaller training data, and processing limitations. Nonetheless, these base
models have learned a lot of transferable features from their respective color
spaces at the end of this base model training.

5.2     Model on Band B02, B03, B04, B05, B08, and B12
The model architecture for all bands are identical and already discussed. In
total, twenty-four models were created over six training sets belonging to six
different bands. Table 1 below, shows overall accuracy values over test sets for
these six bands, where AUG stands for base models created over larger aug-
mented datasets. It is evident from the table that for all bands the highest
accuracy was recorded when Greyscale Augmented data was used to train the
base model. Further note that for five out of six bands, the Greyscale base model
      Less is More when Applying Transfer Learning to Multi-Spectral Data        9

has trumped the RGB base model. The grouped bar charts, shown in figure 3,
clearly depicts the behaviour for all bands over different base models. On an av-
erage, the performance was worse when RGB images were used to train the base
model, and it was best when Greyscale images were used for training of the base
model. Also, note that augmentation helped in increasing the performance in
both cases – RGB and Greyscale. Data augmentation created a huge difference
in base model’s capability to transfer general or relatable features.


Table 1. Accuracy values over the dataset for different bands as measured for every
Base-Model.

      Bands                 RGB GREY RGB-AUG GREY-AUG
      Band B02 – Blue       57.7 60.67 59.87  66.3
      Band B03 – Green      54.82 56.19 61.27 65.83
      Band B04 – Red        55.02 59.62 59.71 62.6
      Band B05 – Red Edge 1 43.44 43.32 45.44 52.78
      Band B08 – NIR        55.82 58.84 59.64 64.05
      Band B12 – SWIR 2     43.11 45.15 47.29 54.35


Fig. 3. Grouped bar charts depicting the performance of different bands as well as
different base models among them. Clearly Greyscale Augmented base model has out-
performed in every group.


Model Band B02 For all the bands Precision, Recall, and F1 Scores were
plotted for performance of Grey-trained, RGB-trained base models in both, aug-
10      Y. Sharma & R. Ross.

mented and non-augmented space. Figure 4 shows classification performance for
Band B02, when Grey-trained base model was used to transfer features. it can
be seen that, classes Sea Lake, Residential, and Forest have given the highest
F1 scores, while Highway, River and Permanent Crop are the lowest performing
classes. High Precision, Recall and thus high F1 score for class Sea Lake can
be explained by the fact that a water body has entirely different reflectance
values from another typical land bodies. For all bands and across all types of
base-models, similar trend was seen.

Further Generalisation While our initial approach here did assume a baseline
model built out of both ImageNet and EuroSAT RGB data – resulting in a 110
class dataset, we recognise that this would perhaps not be the most generalised of
baseline models. While it is a smaller dataset, we also ran the above experiments
with a more basic setup while using only mini-Imagenet data for the base model
training – there was no EuroSat data in the base model. Again this approach
demonstrated that a greyscale based baseline model outperformed the RGB base
model, please see Table 2 on page 11.


     Fig. 4. Comparative performances for Greyscale images trained base model.

6    Conclusions and Future Work
This research aims to improve the transfer-ability of large CNN models to target
domains of single or multi-channel images having small labelled datasets like do-
mains of RS and Medical Imagining. Our results have shown that single channel
trained base models are better at transferring relevant features to a multispectral
problem space like that of RS, in comparison to RGB trained base models. Using
this knowledge, better RS applications can be developed. Many RS fields stand
benefitted from this research like flood detection, coastline detection, urban and
rural planning, and also military research.
      Less is More when Applying Transfer Learning to Multi-Spectral Data           11

Table 2. Accuracy values for bands for every Base-Model trained only on mini-
ImageNet dataset.

                     Bands                 RGB GREY
                     Band B02 – Blue       58.79 63.34
                     Band B03 – Green      56.57 58.68
                     Band B04 – Red        57.64 60.96
                     Band B05 – Red Edge 1 44.71 44.87
                     Band B08 – NIR        57.06 60.84
                     Band B12 – SWIR 2     45.01 44.38

Future work and Recommendations Using the base models created during the
research, similar analysis can be conducted on some band combinations as well.
One important future work that will be taken up is, using the full ImageNet
dataset for training the base model. We believe this model will provide better
results again – particularly if combined with a more complex state-of-the-art
image network than the Resnet50 we have applied here. We believe that these
two elements taken together, with a more generalised set of testing models, will
underscore the usefulness of this contribution.


7   Acknowledgements
This research was conducted with the financial support of Science Foundation
Ireland under Grant Agreement No. 13/RC/2106 at the ADAPT SFI Research
Centre at Technological University Dublin. The ADAPT SFI Centre for Digital
Media Technology is funded by Science Foundation Ireland through the SFI
Research Centres Programme and is co-funded under the European Regional
Development Fund (ERDF) through Grant No. 13/RC/2106.

References
 1. A. Van Etten, D. Lindenbaum, and T. M. Bacastow. SpaceNet: A Remote Sensing
    Dataset and Challenge Series. arXiv:1807.01232 [cs], July 2018. arXiv: 1807.01232
 2. Saikat Basu, Sangram Ganguly, Supratik Mukhopadhyay, Robert DiBiano,
    Manohar Karki, and Ramakrishna R. Nemani. 2015. DeepSat - A Learning frame-
    work for Satellite Imagery. CoRR abs/1509.03602 (2015)
 3. Jain, P.; Schoen-Phelan, B.; Ross, R. Automatic flood detection in SentineI-2 im-
    ages using deep convolutional neural networks. In Proceedings of the 35th Annual
    ACM Symposium on Applied Computing, Brno, Czech Republic, (15 September
    2020a); pp. 617–623.
 4. Cheplygina, V. (2019). Cats or cat scans: transfer learning from natural or medical
    image source datasets? In arxiv:1810.05444 [cs.cv].
 5. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are
    features in deep neural networks? In In advances in neural information processing
    systems, pages 3320-3328.
 6. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet:
    A large-scale hierarchical image database
 7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
    In: CVPR. (2016)
12      Y. Sharma & R. Ross.

 8. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. In Ieee transactions
    on knowledge and data engineering -22(10) (p. 1345/1359).
 9. Rusu, A. A., Vecerik, M., Rothorl, T., Heess, N., Pascanu, R., & Hadsell, R. (2016).
    Sim-to real robot learning from pixels with progressive nets. In arxiv preprint
    arxiv:1610.04286.
10. Mikolov, T., Joulin, A., & Baroni, M. (2015). A roadmap towards machine intelli-
    gence. In arxiv preprint arxiv:1511.08130.
11. Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In In 2011 ieee
    conference on computer vision and pattern recognition (cvpr).
12. Razavian, A., Azizpour, H., Sullivan, J., and Carlsson, S. CNN Features off-the-
    shelf: An Astounding Baseline for Recognition. CoRR, abs/1403.6382, 2014.
13. Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning
    with deep convolutional generative adversarial networks
14. Zhuang, F., Cheng, X., Luo, P., Pan, S. J., & He, Q. (2015). Supervised represen-
    tation learning: Transfer learning with deep autoencoders. In Ijcai international
    joint conference on artificial intelligence, 4119/4125.
15. Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding
    deep learning requires rethinking generalization.
16. Kurakin, A., Goodfellow, I., & Bengio, S. (2017). Adversarial examples in the
    physical world.
17. Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image
    classification using deep learning. In arxiv preprint arxiv:1712.04621.
18. K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, Return of the devil in the
    details: Delving deep into convolutional nets, arXiv preprint arXiv:1405.3531.
19. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied
    to document recognition. Proc IEEE 86:2278–2324
20. Moacir Ponti, Arthur A Chaves, F´abio R Jorge, Gabriel BP Costa, Adimara
    Colturato, and Kalinka RLJC Branco. Precision agriculture: Using low-cost sys-
    tems to acquire low-altitude images. IEEE computer graphics and applications,
    36(4):14–20, 2016
21. B. Bischke, P. Helber, C. Schulze, V. Srinivasan, and D. Borth. The Multimedia
    Satellite Task: Emergency Response for Flooding Events. In MediaEval, 2017.
22. O. A. Penatti, K. Nogueira, and J. A. dos Santos. Do deep features generalize from
    everyday objects to remote sensing and aerial scenes domains? In Proceedings of
    the IEEE Conference on Computer Vision and Pattern Recognition Workshops,
    pages 44–51, 2015
23. K. Nogueira, O. A. Penatti, and J. A. dos Santos. Towards better exploiting con-
    volutional neural networks for remote sensing scene classification. Pattern Recog-
    nition, 61:539–556, 2017.
24. M. Castelluccio, G. Poggi, C. Sansone, and L. Verdoliva. Land use classifica-
    tion in remote sensing images by convolutional neural networks. arXiv preprint
    arXiv:1508.00092, 2015.
25. G.-S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, and L. Zhang. Aid: A benchmark
    dataset for performance evaluation of aerial scene classification. arXiv preprint
    arXiv:1608.05167, 2016.
26. Yi Yang and Shawn Newsam, ”Bag-Of-Visual-Words and Spatial Extensions for
    Land-Use Classification,” ACM SIGSPATIAL International Conference on Ad-
    vances in Geographic Information Systems (ACM GIS), 2010

</pre>