-

Land Cover Semantic Segmentation Using ResUNet

Loukas Kouvaras

Eleni Charou

exarou@iit.demokritos.gr 1

Vasilis Pollatos

vaspoll97@gmail.com 2 0 Harokopio University , Athens , Greece 1 NCSR Demokrtios , Athens , Greece 2 NTUA , Athens , Greece

In this paper we present our work on developing an automated system for land cover classification. This system takes a multiband satellite image of an area as input and outputs the land cover map of the area at the same resolution as the input. For this purpose convolutional machine learning models were trained in the task of predicting the land cover semantic segmentation of satellite images. This is a case of supervised learning. The land cover label data were taken from the CORINE Land Cover inventory and the satellite images were taken from the Copernicus hub. As for the model, U-Net architecture variations were applied. Our area of interest are the Ionian islands (Greece). We created a dataset from scratch covering this particular area. In addition, transfer learning from the BigEarthNet dataset [1] was performed. In [1] simple classification of satellite images into the classes of CLC is performed but not segmentation as we do. However, their models have been trained into a dataset much bigger than ours, so we applied transfer learning using their pretrained models as the first part of out network, utilizing the ability these networks have developed to extract useful features from the satellite images (we transferred a pretrained ResNet50 into a U-Res-Net). Apart from transfer learning other techniques were applied in order to overcome the limitations set by the small size of our area of interest. We used data augmentation (cutting images into overlapping patches, applying random transformations such as rotations and flips) and cross validation. The results are tested on the 3 CLC class hierarchy levels and a comparative study is made on the results of diferent approaches.

LULC, U-NET, deep learning, transfer learning,Ionio

INTRODUCTION

Modern AI technologies, such as deep learning, can be utilized in various fields of natural science to automate and underpin procedures traditionally carried out by humans. Remote sensing nowadays provides a great amount of data of high quality which are updated on a daily basis. Another important thing is that these data are easily produced and are open to the public in contrast to other sources, such as aerial photography that are of higher quality but are more expensively and less massively produced. For some problems (in our case land cover recognition) the resolution of the open remote sensing data (10m for sentinel-2) is adequate. The big data of remote sensing can be fed into machine learning models to develop automated systems that analyse this data and carry out useful tasks. Labeled data are the most useful ones, as they can be utilised for the purposes of supervised learning that solves a great range of problems.

CLC provides a huge labeled dataset. It contains maps for the most part of Europe for the last three decades. Our goal is to train models to predict the labels of the CLC dataset. Most research done in this field is about assigning one or more land cover labels into a whole satellite image patch (which can take an area of several square kilometres). Our approach to the problem is more general, trying to construct a semantic segmentation of the satellite image into the full range of the land cover classes provided by the Corine Land Cover inventor, at the maximal resolution provided by sentinel-2 satellite images, which is 10m. The classes of CLC are hierarchical. We are testing the ability of the models to predict the classes on each one of the hierarchical levels. As expected, we see that the superclasses on the higher levels are discriminated with greater accuracy than the subclasses on the lower levels.

Corine Land Cover has a wide variety of applications, underpinning various Community policies in the domains of environment, but also agriculture, transport, spatial planning. Developing a system that automates the production of CLC maps to some extent is important because CLC needs to be updated every few years. Creating these maps is a burdensome and time-consuming job for the human and even so the accuracy of the produced maps isn’t perfect. An automatic land cover classification system could help develop such maps in the future, track down sudden or short term changes that happen to the land cover (for example due to natural disasters or due to fast track rural and urban development). It could also be applied to areas that are not included in the CLC.

State of the art deep learning models were used and the training and testing were done in the area of Ionio. This is a case of work on a relatively small area with special geological and natural features. It is also an area of varying morphology and landscapes and small scale land cover characteristics that can hardly be detected on the resolution provided by sent-2 images. Similar approaches can be used for training and testing in other areas covered by the sentinel-2 satellites. As a first step we trained a simple U-Net from scratch in the area of interest. Recently, a similar research was done in the TU Berlin, developing the BigEarthNet. They perform simple classification of satellite images into the classes of CLC but not segmentation as we do. However, their models have been trained into a dataset much bigger than ours, so we applied transfer learning using their pretrained models as the first part of out network, utilizing the ability these networks have developed to extract useful features from the satellite images (we transferred a pretrained ResNet50 into a U-Res-Net). Apart from transfer learning other techniques were applied in order to overcome the limitations set by the small size of our area of interest. We used data augmentation (cutting images into overlapping patches, applying random transformations such as rotations and flips) and cross validation. is distributed over 6 Ionian islands (Corfu, Paxi, Lefkada, Kalamos, Kefalonia, Zante) and the coast of Parga. 2

RELATED WORK

Land Cover Recognition gathers a lot of interest in the research community. In our work we apply transfer learning from the models trained in BigEarthNet [ 1 ]. The BigEarthNet dataset contains 590,326 non-overlapping image patches of size 1200m ×1200m distributed over 10 european countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland). Each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). They train models that take each patch as input and predict the classes appearing in this patch. They solve a simpler problem than ours, because the resolution of the output of their models is 1200m, while the resolution of our predicted maps is 10m. However, their models have been trained on a dataset much bigger than ours and have learned to extract useful features from the images (encoding) that are later on decoded to solve their task. We are using the pretrained encoder of a res-net-50 trained on BigEarthNet as the encoder part of a unet-like architecture to solve our semantic segmentation problem. This approach has also been adopted by [ 3 ]. UNet architecture was introduced in [ 21 ]. ResNetUnet, the architecture we are using, is commonly used for such problems. In [ 5 ] a sophisticated ResNetUnet that performs multitasking achieved state of the art results for the ISPRS 2DPotsdam dataset. One of the subproblems solved in this multitasking is finding the class boundaries, which is also proposed in [ 10 ]. However, as far as our problem is concerned, these methods are applied on high resolution images of urban areas and may be of little use for our problem. In order to conquer the limitations set by our small dataset, data augmentation is applied as in [ 12 ], [ 13 ], [ 22 ]. In our work we used Sentinel-2 bands with 10m resolution and bands with 20m resolution. Others have used multisource data including optical data and Sentinel-1 radar measurements [ 14 ] ,[ 15 ], [ 16 ]. Multi-temporal data viewing the same area on diferent timestamps is another approach taken in [ 16 ],[ 17 ],[ 18 ],[ 19 ]. In order to deal with missing labels active learning [ 19 ],[ 20 ], self-learning [ 18 ] and weakly supervised learning [ 6 ], [ 7 ], [ 8 ] is performed. 3 3.1

METHODOLOGY Dataset

Our dataset was created by multispectral satellite images of the Ionian Islands downloaded from Copernicus for the period of 2018 and part of the CLC 2018 that covers the Ionian Islands. CLC vector files were georeferenced together with the Copernicus images, turned into raster with 10m resolution and altogether were clipped in the same bounds creating tifs for each one of the islands. These tifs were cut into patches of size 1,28x1,28 km (128x128 pixels) with some high degree of overlap. Xdata consists of these patches having the satellite image bands as features for each pixel and Ydata consists of the corresponding CLC patches. Our networks are trained to solve the task of predicting the CLC label for each pixel of the input patch, given the band measurements for each pixel of the input patch. So we are trying to find a function f such that Ydata = f (Xdata). This is a case of supervised learning. Our area of interest

Kalamos Parga South Corfu North Corfu Kefalonia Lefkada North Zante Paxi

For each area we have the sentinel-2 10m resolution bands (R,G,B, infrared), the sentinel-2 20m resolution bands (b05, b06, b07, b8A, b11 and b12) and the corine land cover class label for each pixel. In our problem the satellite image bands are the inputs to our network and the clc classes the expected output.

Corine Land Cover classes are hierarchical into three levels. Our approach is training the models on the full range of the corine land cover classes and then testing them on each level separately.

The area of interest has to be splitted into training and test sets. Due to the small size of our dataset we chose not to use a validation set for the fine tuning of hyperparameters such as the number of epochs. The training process was stopped when the loss function started to converge and not when it was minimal for the validation set. We are performing cross validation so the area of interest has to be divided into a number of subsets of approximately same size . The area of interest was partitioned into the following 6 subsets: 1. north Corfu, 2. south Corfu, 3. west Kefalonia, 4. east Kefalonia, 5. Lefkada, 6. Paxi+North Zante+Kalamos+Parga The splitting into training and validation sets is done 6 times, so that each time a diferent subset is the validation set and the remaining 5 are the training set.

Each area is cut into overlapping patches. The overlaps are a form of data augmentation. Patch size is 1.28 km x 1.28 km and the hop between adjacent patches is 0.64 km in each direction (longitude and latitude). Two memory optimisations were applied. Firstly, patches are stored by defining only their limits in the original satellite image and the cutting is only performed on dataloading. Secondly, patches containing only sea are discarded ( e.g. the blue square in the right image below). This is a good practice because it turns out that the models are able to learn to recognise the sea almost perfectly even without those patches. It also reduces class imbalancement, as sea patches are the most frequent ones in our area.

For the transfer learning experiments data needed to be standardised using the same mean and std values as the base model. On dataloading random flips and rotations were applied for the purposes of data augmentation.

Overlapping patches 1.28 km x 1.28 km patch

Two diferent approaches were followed. The first approach was to train a baseline UNet from scratch into the area of interest. The second approach was to perform transfer learning. The transfer learning UNet model has a ResNet-50 architecture on the encoder part and the weights of the encoder are initialised to the values of the weights of a ResNet-50 trained on the BigEarthNet . The ifgure below shows the exact architecture of the transfer learning model. There are approximately 66.000.000 trainable parameters on this model. A more complex version of this model that applied no compression on the outputs of the encoder that were passed to the decoder through shortcuts had 91.000.000 trainable parameters and improved fitting on the training set but didn’t seem to generalise better than the model presented below.

The baseline UNet model that was trained from scratch solved an easier problem, as the output and the ground truth land cover images had a resolution of 100m. 3.3

Training

We are trying to solve a semantic segmentation problem and a composite dice and a binary cross entropy loss with logits criterion is used.The two loss criteria are summed, each one with a weight factor of 0.5. We experimented with positive weights pc in the bce: (, ) = = {1, , . . . , , }⊤,

, = −, , · log (, ) + (1 − , ) · log(1 − (, )) where c is the class number.

Setting = ( ) , for diferent values of a in (0, 1] for class balancing deteriorated our results. Adam optimiser is used to achieve fitting in the training data. initial = 5 · 10−4 and it gradually decreases with the use of a scheduler.The complexity of our model requires the use of regularization techniques. We applied dropout, with rate 0-0.2 for the outer layers and 0.3-0.4 for the inner hidden layers. For the first epochs of the training, the weights of the base transfer learning model remain frozen. We unfreeze them when the learning process starts to converge, dropping at the same time the learning rate. As we can see below, unfreezing the base model on epoch 80 causes some instability. However, after some epochs the loss returns to the low values it had before the unfreezing. The pretrained encoder seems to work properly without further training, but the unfreezing brings some slight improvements so we perform it. Training was executed on google colab. Several versions of the problem are being examined. Firstly, training from scratch was done on the area of interest. A baseline model shown in figure 6 was used. The produced maps had a resolution 100m. The visual results and the metrics for the validation set are presented below:

Kefalonia (target) Kefalonia (prediction)

class 1.1 Urban fabric 1.2 Industrial, commercial and transport units 1.3 Mine, dump and construction sites 1.4 Artificial, nonagricultural vegetated areas 2.1 Arable land 2.2 Permanent crops 2.3 Pastures 2.4 Heterogeneous agricultural areas 3.1 Forest 3.2 Shrub and/or herbaceous vegetation associations 3.3 Open spaces with little or no vegetation 4.1 Inland wetlands 5.2 Marine waters accuracy = 0.85329 1 = 0.4124 1 = 0.85329 1ℎ = 0.8522 0.0003858 0.02698 0.186 0.081 0.6165

Classification Report 2.Agricultural 3.Forest areas and seminatural areas 1922887 2332193 0.7729 0.8075 0.747 0.8537 class 1.1.2 Discontinuous urban fabric 1.3.1 Mineral extraction sites 1.4.2 Sport and leisure facilities 2.1.1 Non-irrigated arable land 2.2.3 Olive groves 2.3.1 Pastures 2.4.2 Complex cultivation patterns 2.4.3 Land principally occupied by agriculture, with significant areas of natural vegetation 3.1.2 Coniferous forest 3.1.3 Mixed forest 3.2.1 Natural grassland 3.2.3 Sclerophyllous vegetation 3.2.4 Transitional woodland/shrub 3.3.2 Bare rock 3.3.3 Sparsely vegetated areas 5.2.3 Sea and ocean accuracy = 0.88019 1 = 0.559 1 = 0.88019 1ℎ = 0.89214 Fold 3: class

In the experiments presented above our method was to keep a continuous area as a validation set, for example a whole island. Now we present a diferent approach where the validation patches are randomly distributed over the area of interest. This is also a realistic problem, where the experts sparsely assign land cover labels on the area of interest and the remaining unlabeled areas are predicted by a model trained on the neighbouring labeled ones. To make sure that the training and the validation set have no common elements we skipped data augmentation via overlaps, but the flips and rotations are still used. We split the area of interest into train and validation with a ratio of 70, 30 respectively.

The metrics for the validation are presented below. 1 = 0.28225 1 = 0.59871 1ℎ = 0.62438

Finally we are going to present some examples that show the performance of our model. All the predictions presented are on validation data.The number on the top of each image on the left indicates the fold number (6-fold cross validation).

In some of the above examples we see the dificulty of our problem, deriving from the low resolution of the input images and the ambiguity of the corine labels. In some cases the model made the right predictions, even though it is a dificult task even for the human observing the rgb input image. • Our models provide a basis for the creation of land cover maps based on the CLC nomenclature. The visual results show the ability of our models to find the boundaries between classes and the accuracy on the higher levels of the class hierarchy is pretty good. The accuracy on common subclasses is also good. However, the performance on predicting uncommon classes and discriminating subclasses of the same superclass on the lower levels of the CLC class hierarchy isn’t adequate and human supervision may be needed for this task. • The CLC dataset contains imperfections. These limit the accuracy of our models. However, in some cases the model can outperform the accuracy of the dataset in cases where the dataset has a lower quality than it’s average. • Usually the land cover is mixed or can not be described accurately by the existing CLC classes. This leads to discord between the labeled data and the predictions, even for kinds of land cover that have been seen on the training set. We also observe that sometimes there are multiple class labels that could describe the land cover and despite the seeming disagreement between the model output and the labels they are close to each other. This indicates the need for a more sophisticated loss criterion and performance metrics that give diferent penalties to diferent types of confusion between classes, taking into account the hierarchical structure of the classes and the similarities and overlaps between classes. • Increasing the resolution of the output from 100m to 10m can give better results but bigger models are required (more parameters). • The main contribution of transfer learning was speeding up the training processes and possibly improving the results. The encoder part of the network didn’t have to be trained, at least for the first epochs of the training, resulting in decreased epoch duration. • Using a bigger dataset could boost the performance of our models in the area of interest, especially in the task of predicting uncommon classes.

[1]

Sumbul ,

Charfuelan ,

Demir , and

Markl Bigearthnet : A large-scale benchmark archive for remote sensing image understanding . IEEE International Conference on Geoscience and Remote Sensing Symposium , pp. 5901 - 5904 , Jul 2019 .

[2]

Copernicus

Programme. CORINE Land Cover . Last Accessed: 2020 -10-10.

[3]

Priit

Ulmas , Innar Liiv, Segmentation of Satellite Imagery using U-Net Models for Land Cover Classification . arXiv: 2003 .02899

[4] Charou , Eleni, George Felekis, Danai Bournou Stavroulopoulou, Maria Koutsoukou, Antigoni Panagiotopoulou, Yorghos Voutos, Emmanuel Bratsolis, Phivos Mylonas, and Laurence Likforman-Sulem. Deep Learning for Agricultural Land Detection in Insular Areas . In 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA) , pp. 1 - 4 . IEEE, 2019 .

[5] Diakogiannis , Foivos & Waldner, Francois & Caccetta, Peter & Wu, Chen. ( 2019 ). ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data .

[6]

Nivaggioli and

Randrianarivo , Weakly supervised semantic segmentation of satellite images . CoRR , vol. abs/ 1904 .03983, 2019 .

[7]

Wang ,

Chen ,

S. M.

Xie , G. Azzari, and

D. B.

Lobell , Weakly supervised deep learning for segmentation of remote sensing imagery . Remote Sensing , vol. 12 , no. 2 , p. 207 , 2020 .

[8]

Ahn and S. Kwak Learning pixel-level semantic afinity with image-level supervision for weakly supervised semantic segmentation . CoRR , vol. abs/ 1803 .10464, 2018 .

[9] Mohammadimanesh , F. , Salehi , B. , Mahdianpari , M. , Gill , E. , & Molinier , M. ( 2019 ) A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem . ISPRS Journal of Photogrammetry and Remote Sensing , 151 , 223 - 236 . https://doi.org/10.1016/j.isprsjprs. 2019 . 03 .015

[10] Dimitrios

Marmanis

, Konrad Schindler, Jan Dirk Wegner, Silvano Galliani, Mihai Datcu, Uwe Stilla Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection . arXiv: 1612 . 01337

[11]

Buchhorn ,

Smets ,

Bertels ,

Lesiv ,

N.-E.

Tsendbazar ,

Herold , and S. Fritz “Copernicus Global Land Service: Land Cover 100m: epoch 2015 : Globe,” Oct. 2019 .

[12]

G. J.

Scott ,

M. R.

England ,

W. A.

Starms ,

R. A.

Marcum and

C. H.

Davis Training Deep Convolutional Neural Networks for Land-Cover Classification of HighResolution Imagery IEEE Geoscience and Remote Sensing Letters , vol. 14 , no. 4 , pp. 549 - 553 , April 2017 , doi: 10.1109/LGRS. 2017 . 2657778 .

[13]

Benbahria ,

M. F.

Smiej , I. Sebari , and

Hajji Land cover intelligent mapping using transfer learning and semantic segmentation . 2019 7th Mediterranean Congress of Telecommunications (CMT) , pp. 1 - 5 , 10 2019 .

[14]

Kussul ,

Lavreniuk ,

Skakun , and

Shelestov

Deep learning classification of land cover and crop types using remote sensing data . IEEE Geoscience and Remote Sensing Letters , vol. PP , pp. 1 - 5 , 03 2017

[15]

Y. J. E.

Gbodjo ,

Ienco ,

Leroux ,

Interdonato ,

Gaetano ,

Ndao , and S. Dupuy “ Object-based multi-temporal and multi-source land cover mapping leveraging hierarchical class relationships ,” 2019 .

[16] Gbodjo , Yawogan & Leroux, Louise & Gaetano, Rafaele & Ndao, Babacar. ( 2019 ). RNN-based Multi-Source Land Cover mapping: An application to West African landscape .

[17] Ziheng

Sun

Liping

Di & Hui Fang ( 2019 ) Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series . International Journal of Remote Sensing , 40 : 2 , 593 - 614 , DOI: 10.1080/01431161. 2018 .1516313

[18] Kim , Y. ; Park , N.-W.; Lee , K.-D. Self -Learning Based Land-Cover Classification Using Sequential Class Patterns from Past Land-Cover Maps . Remote Sens . 2017 , 9 , 921 .

[19]

Demir ,

Bovolo and

Bruzzone , Updating Land-Cover Maps by Classification of Image Time Series: A Novel Change-Detection-Driven Transfer Learning Approach . in IEEE Transactions on Geoscience and Remote Sensing , vol. 51 , no. 1 , pp. 300 - 312 , Jan. 2013 , doi: 10.1109/TGRS. 2012 . 2195727 .

[20] Caleb

Robinson

, Anthony Ortiz, Kolya Malkin, Blake Elias, Andi Peng, Dan Morris, Bistra Dilkina, Nebojsa Jojic Human-Machine Collaboration for Fast Land Cover Mapping . arXiv: 1906 .04176

[21] Ronneberger , Olaf; Fischer, Philipp; Brox, Thomas ( 2015 ). "U-Net: Convolutional Networks for Biomedical Image Segmentation" ..

[22] Stivaktakis , Radamanthys & Tsagkatakis, Grigorios & Tsakalides, Panagiotis. ( 2019 ). Deep Learning for Multilabel Land Cover Scene Categorization Using Data Augmentation . IEEE Geoscience and Remote Sensing Letters. PP. 1-5 . 10 .1109/LGRS. 2019 . 2893306 .