Classifying land cover from satellite images using time series analytics Patrick Schäfer Dirk Pflugmacher Ulf Leser Computer Science Department Patrick Hostert Computer Science Department Humboldt-Universität zu Berlin Geography Department Humboldt-Universität zu Berlin Germany Humboldt-Universität zu Berlin leser@informatik.hu-berlin.de patrick.schaefer@hu-berlin.de [dirk.pflugmacher,patrick.hostert] @geo.hu-berlin.de ABSTRACT time, e.g. because of vegetation senescence or harvesting [27]. The Earth’s surface is continuously observed by satellites, lead- The task can be approached in different ways. In a typical base- ing to large multi-spectral image data sets of increasing spatial line setting, the different measurements per pixel are used as resolution and temporal density. One important application of independent features for a classical machine learning-based clas- satellite data is the mapping of land cover and land use changes sifier, such as Naive Bayes or Decision Trees [20]. In this ap- such as urbanization, deforestation, and desertification. This in- proach, the temporal order of the measurements is ignored as formation should be obtained automatically, with high accuracy, all features are treated as orthogonal dimensions of the feature and at the pixel level, which implies the need to classify millions space. An alternative method is to include the consideration of of pixels even when only small regions are studied. Balancing the order of measurements by using methods from time series runtime and accuracy for this task becomes even more challeng- analytics [2, 24]. Here, every pixel is considered as a temporally ing with the recent availability of multiple time points per pixel, ordered (and aligned) series of measurements, and the specific created by periodically performed satellite scans. In this paper changes (increasing or decreasing slopes, periodic changes etc.) we describe a novel approach to classify land cover from series of of the measurements over time are analyzed to find common- multi-spectral satellite images based on multivariate time series alities and to derive classification models. Previous works (see analytics. The main advantage of our method is that it inherently Section 3) have shown that this can be advantageous as land cover models the periodic changes (seasons, agriculture etc.) underly- are temporally variable and often follow characteristic temporal ing many types of land covers and that it is comparably robust to patterns, such as those imposed by seasons. However, given the noise. Compared to a classical feature-based classifier, our new enormous scale of the data to be classified, not only the accuracy method shows a slightly superior overall accuracy, with an in- of an approach is important, but also the runtime performance crease of up to 20% in accuracy for rare land cover classes, though has a critical role in any practical application. at the cost of notably increased runtime. The highest accuracy is In this work, we evaluate the recently proposed multivariate achieved by combining both approaches. time series classification algorithm WEASEL+MUSE [26] for land cover classification using temporally dense, medium-resolution satellite images. WEASEL+MUSE models multivariate time series 1 INTRODUCTION using the truncated Fourier transformation and discretizes mea- Monitoring changes in land usage is an important area of research surements, both of which to reduce noise, builds a rich feature as land cover is a key variable driving the Earth’s energy balance, space to capture also subsequences in the time series, is able to hydrological and carbon cycle, and the provisioning of natural exploit similar temporal subsequences even when appearing at resources and habitat [22]. Over the last three decades, satellite- very different offsets within a time series, and uses aggressive based Earth Observation (EO) programs have made tremendous feature selection to remove irrelevant features and thereby speed- progress in acquiring medium-resolution (10 − 100m) images up classification. Although WEASEL+MUSE was not developed around the globe systematically and with increasing frequency specifically for land cover classification, many of its aspects fit (revisit time). As a result, large volumes of medium-resolution nicely to the specificities of this domain, such as the inherent satellite images are now available free of charge, enabling au- noise reduction and the exploitation of repetitive behaviour. tomatic approaches to the identification of land usage and the We compare the prediction performance of WEASEL+MUSE detection of land surface changes over large areas. For example, with the performance of an established and popular machine the American Landsat 8 sensor images the Earth at 30-m spatial learning-based approach, Random Forests, using the same input resolution every 16 days, and two European Sentinel-2 sensors ac- features on 23 Landsat 8 images collected in 16 day-intervals over quire images with a revisit time of 5 days and a spatial resolution Reunion Island. The study region covers an area of 2866x2633 of 10 − 20 meters, which amounts to roughly 60 measurements pixels at 30~m spatial resolution. As reference dataset for model for more than 300 Billions pixels (excluding oceans) in a year. training and validation, we used a sample of 81714 pixels that The free availability of medium-resolution satellite image time had been manually classified into 9 land cover classes. series has spawned new possibilities for mapping land cover [9]. Our results indicate that time-series-based algorithms improve In the past, classification approaches operated on single images or land cover classification accuracy compared to non-temporal al- stacks of images (i.e. composite classification). Multi-date classifi- gorithms. Our time series algorithm WEASEL+MUSE achieved cation approaches exploit the notion that land cover can vary over higher classification accuracies than Random Forests. The im- provement in accuracy was most notable with rare and/or difficult © 2018 Copyright held by the owner/author(s). Published in the Workshop classes. Here, class-wise accuracies increased by 8 and 3 percent- Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna, Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted age points, respectively. Overall accuracy improved by 1%-point under the terms of the Creative Commons license CC-by-nc-nd 4.0. owing to the fact that the dominant classes were less affected by 10 the choice of algorithm. Interestingly, Random Forests captures applications and reference data. However, there has been a trend different signals in the data than the time-series-based approach, away from parametric statistical models to machine learning to as a simple Ensemble of both approaches further improved clas- deal with the complexity of input data and class legends. sification accuracy. However, this increased accuracy comes at It has only recently become feasible to build land cover map- the cost of an increased runtime. Thus, we will focus future work ping algorithms that exploit the temporal domain of entire pixel on improving the runtime of our method without loosing the time series with medium resolution data [12]. To this end, satel- advancements in prediction performance we observed. lite images typically first undergo a series of pre-processing steps, The rest of this paper is organized as follows: Section 2 pro- including the correction of atmospheric effects, geometric align- vides background on land cover classification using satellite im- ment and cloud and cloud-shadow masking. Once these steps ages. Section 3 presents the current state of the art in land cover are finished, spectral values of pixels can be traced over time to mapping and time series classification. Section 4 describes the identify and detect land surface changes such as deforestation [8], test dataset and the tested classification methods: Random Forests or urbanization [19]. and WEASEL+MUSE. Section 5 presents the experiments. 3 RELATED WORK 2 LAND COVER CLASSIFICATION FROM Although, time-series based classification (TSC) is a relatively SATELLITE IMAGES new area in remote sensing, the topic itself has a long tradi- tion and dozens of approaches exist (see [2, 24], for instance). The classification of satellite images to extract information on Time-series-based classifiers can broadly be categorized into two land cover and land use has a long history. A significant turning classes: Similarity-based methods use a similarity measure over point in terrestrial Earth Observation was the launch of Landsat-1 sequences, such as Dynamic Time Warping (DTW), to perform a in 1972 (then called Earth Resource Technology Satellite). For the point-wise comparison of two time series. In contrast, feature- first time, Landsat delivered systematic observations of the Earth based TSC rely on comparing features extracted from the different land surface for land monitoring, leading to new ways of machine- time series, typically generated from their substructures. assisted approaches for mapping land cover from space [7]. In the These two approaches are also the basis for methods in multi- 1990s and early 2000s, international and national agencies started variate time series classification (MTSC), where a time series is to adopt operational land cover mapping programs with Landsat not made of a single stream of values but by multiple streams, and Landsat-like data, e.g. in Europe (CORINE), USA (NLCD), each one usually synchronized in time. A popular similarity- Canada (EOSD), and Australia (NCAS-LCCP). Even decades later, based MTSC method is dynamic time warping (DTW) [21]. the Landsat program is still active today. Since February 2013, Feature-based MTSC can be grouped into those methods that Landsat 8 is taking images of the Earth at 30 m spatial resolution build feature from so-called shapelets [30], which are short and in 8 spectral bands every 16 days. The radiometric quality and maximally discriminative subsequences of the time series, and spectral resolution has greatly improved since the early satellites, methods using the bag-of-patterns (BOP) approach [3]. The stan- and because of new global acquisition strategies and on-board dard BOP model [17] break up a time series into windows, rep- storage and download capabilities, so has the sheer number of resent the corresponding subsequences within each window as available images. The number of medium spatial resolution (10 − discrete features (or words), and finally derives a classifier from 100 m) sensors has been increasing [4], thus dense time series of the frequencies of the words. Different approaches to TSC (and medium resolution are available for many parts of the globe. MTSC) differ not only in their accuracy on different data sets, but Multi-spectral sensors like Landsat record the sun’s energy also in their runtime for classification, which is a critical issue reflected by a surface in a few distinct spectral wavelengths when it comes to very large data sets as is the case of satellite (bands), e.g. blue, green, red in the visible spectrum (400 nm to images. For instance, we recently evaluated the runtime of 12 700 nm), near infrared (700 to 1100 nm), and short-wave infrared different state-of-the-art methods for (univariate) TSC and found (1100 to 3000 nm). Since land surfaces with different chemical and differences of up to three orders of magnitude [24, 25]. structural properties often absorb and reflect sunlight differently Most approaches to land cover classification rely on traditional and wavelength-dependent, information on land cover can be machine-learning methods (see previous section), and there have derived from these spectral bands. For example, water absorbs been only a few prior studies on using time series information. For much of the near-infrared radiation, so these wavelengths are example, [32] fitted harmonic functions to each satellite band and useful for discerning land-water boundaries that are not obvious used the fitted parameters as features in subsequent classification in visible light. Similarly, green vegetation absorbs much of the (which implies that it falls into the class of feature-based MTSC incoming radiation in the red spectrum while reflecting about methods). [14] found that including temporal information into a 50% of the radiation in the near-infrared spectrum. model can have a bigger impact on classification accuracy than Much of the past research on classification algorithms has fo- the choice of the particular classification algorithm. New time- cused on exploiting the spectral and spatial properties of land cov- series-based algorithms are needed to leverage the predictive ers, including artificial neural networks [1], decision trees [11], potential of satellite time series images [6]. support vector machines [15], and spatial segmentation algo- rithms [10]. Each algorithm has its strength and weakness with 4 METHODS respect to: the distributional assumptions made about the data, training requirements, computational complexity, and robust- 4.1 Description of Data ness to overfitting, data noise, and errors in training data. Also For our analysis, we used a public dataset taken from the TiSeLaC common to all algorithms is that the work is supervised, i.e., they (Time Series Land Cover Classification Challenge) [28]. This need an independent reference data set (i.e., land cover infor- dataset consists of time series of 23 Landsat 8 images collected in mation collected in the field or from air-photos) for training a 16 day-intervals over Reunion Island in 2014. Landsat data were model. It is fair to say, that no single algorithm works best for all provided by the French Pôle Thématique Surfaces Continentales 11 1.0 Urban Areas 4.2 Time Series Analytics Other built-up surfaces 0.8 Forests In general, a time series dataset contains N time series. Each time Sparse Vegetation series is associated with a class label y from a predefined set of Rocks and bare soil 0.6 Grassland labels Y. Time series classification (TSC) is the task of predicting a Sugarcane crops class label for a time series whose label is unknown. A classifier 0.4 Other crops is a function that is learned from a set of labelled time series Water NDVI (the training data), takes an unlabelled time series as input and 0.2 outputs a label. In this paper, each pixel should be labeled by one 0.0 of the nine reference land cover classes. The 23 Landsat 8 images contain 10 time series of spectral 0.2 features each of length 23. At the pixel level, this data can be interpreted as a multivariate time series (MTS). 0.4 0 5 10 15 20 A multivariate time series (MTS) T = {t 1 , . . . , tn } is an ordered time steps sequence of n ∈ N streams ti = (ti,1 , . . . ti,m ) ∈ Rm , i.e., m recorded values at each fixed time stamp. As we address MTS Figure 1: Normalized Difference Vegetation Index (NDVI) generated from satellites with a fixed revisit time, we can safely of 9 different land cover classes for our reference ignore the concrete time stamps. MTS are typically produced by dataset [28]. sensors recording data over time like motion captures, gestures, EEG signals, hand-written letters, sign language, or the multi- spectral image data sets captured by satellites. ID Land cover class Samples 1 Urban Areas 16000 4.3 Machine Learning Approach using 2 Other built-up surfaces 3236 Random Forests 3 Forests 16000 Random forests (RF) are an ensemble learning method widely used 4 Sparse Vegetation 16000 in Earth Observation (EO) for classifying land cover and land use 5 Rocks and bare soil 12942 (e.g. [31]). RF build ensembles of decision trees wherein each tree 6 Grassland 5681 is trained on randomly selected features of a bootstrapped train- 7 Sugarcane crops 7656 ing sample. Node splits are performed using a random subset of 8 Other crops 1600 predictor variables. Because of these random components, the 9 Water 2599 RF approach does not require tree pruning and is relatively in- Table 1: Land cover classes and distribution of reference sensitive to overfitting [5]. To predict a class label, a RF classifies samples [28]. the sample with all its decision trees and returns the mode of the predicted classes. A RF approach to satellite image / pixel classi- fication uses the concatenated spectral features of each pixel as input (this is a vector of length 230 and the 2 coordinates, lon- (THEIA), and atmospherically corrected, geometrically corrected, gitude and latitude). RF are a typical machine learning method and cloud-masked with the Multi-sensor Atmospheric Correction which do not consider any order of the features. If time series and Cloud Screening (MACCS) level 2A processor developed at the are used as features, each point in time is considered as an in- French National Space Agency (CNES). Data pre-processing and dependent feature and the order of measurements is ignored. temporal gap filling was performed using the iota21 Land Cover This implies that such methods are unable to reproduce serial processor developed by CESBIO2 . For each time step and pixel, correlations or to detect temporal trends in the data. ten spectral features were extracted, i.e., the seven reflectance bands and three vegetation indices: the Normalized Difference 4.4 Land Cover Classification with Vegetation Index (NDVI), the Normalized Difference Water Index WEASEL+MUSE (NDWI), and the Brightness Index (BI). Figure 1 shows examples WEASEL+MUSE (Word ExtrAction for time SEries cLassification + of NDVI time series for different land cover classes. MUltivariate Symbols and dErivatives) [26] is a state-of-the-art Reference land cover data were derived from two publicly MTS classifier that is composed of the building blocks depicted available dataset: the 2012 CORINE Land Cover (CLC) map and in Figure 2. It conceptually builds upon the univariate Bag-of- the 2014 farmers’ graphical land parcel registration (Régistre Patterns model applied to each dimension. In the BOP model [23], Parcellaire Graphique - RPG). The most significant classes for a time series is characterized by the frequency of occurrence of the study area were retained, and a spatial processing (aided by substructures. BOP-based algorithms build a classification model photo-interpretation) was performed to ensure consistency with by (1) extracting subsequences from a time series, (2) transform- image geometry. Finally, a pixel-based random sampling of this ing each subsequence (of real values, the measurements) into a dataset was applied to provide an almost balanced ground truth. discrete-valued word (a sequence of symbols over a fixed alpha- The final reference dataset consisted of a total of 81714 pixels bet), (3) building a feature vector from word counts (histogram), distributed over 9 classes (Table1). We split this reference dataset and (4) finally using a classification model from the machine randomly in half for model training and testing. learning repertoire on these feature vectors. Specifically, WEASEL+MUSE treats the pixel time series of the 1 http://tully.ups-tlse.fr/jordi/iota2.git 10 spectral features as a MTS with 10 dimensions. For each spec- 2 http://www.cesbio.ups-tlse.fr tral feature, subsequences using varying lengths are extracted, 12 Figure 2: WEASEL+MUSE Pipeline: Feature extraction, univariate BOP models and the multivariate BOP model. approximated using the truncated Fourier transform, and dis- e.g. 49% vs 51%. To combine the two sets of class probabilities cretized into words using equi-depth or equi-frequency binning. (2x9 class probabilities), we trained a RF model using the 18 A feature vector is built from the words (unigrams) and pairs class probabilities from the training dataset as predictors and the of words (bigrams) to obtain order awareness. Finally, features corresponding land cover class labels as response. are concatenated with the sensor id, to maintain a disjoint word space for each dimension. This high dimensional feature space 5 EXPERIMENTS is subsequently filtered using statistical feature selection (Chi- squared test); finally, a logistic regression classifier is trained, 5.1 Experimental setup assigning weights to characteristic word in each spectral band. Datasets: We evaluated our competitors using the described Because WEASEL+MUSE is multivariate, the algorithm can Landsat dataset. The dataset was randomly split into 40857 train- leverage the multi-spectral information of satellite time series. ing and 40857 test samples. The training dataset was used to This is an advantage over univariate time series models that train each classifier. All reported accuracies are based on the test operate on single indices (i.e. vegetation indices), as spectral- dataset. temporal patterns may differ from sensor band to sensor band. Competitors: We performed a series of experiments: The feature extraction and selection in WEASEL+MUSE make • Build RF on the training dataset and apply it to test dataset. it interesting for land cover recognition: • Build WEASEL+MUSE on the training dataset and apply • Features extraction: The words are derived from sub- it to test dataset. sequences extracted at multiple window lengths in each • Build the combined model on the RF- and MUSE-predicted spectral feature using the truncated Fourier transform class probabilities on the training dataset and apply it to and discretization. The Fourier transform reduces noise the test dataset. introduced by preprocessing, such as the cloud mask or Training: For WEASEL+MUSE we performed 10-fold cross- geometric alignment, and the different window lengths validation on the training dataset to optimize the parameters capture the seasonal trends at different time granularity. Bi- for the SFA quantization method (equi-depth or equi-frequency grams can capture seasonal trends, e.g., higher intensities binning). To perform the RF analysis, we used the R statistical in the spectrum in summer than in autumn. By extracting language and the RF package from [16]. We set the algorithm to subsequences from the pixel stream, the classifier allows √ build 1000 trees and randomly sampled p variables as candidates for small shifts in the time line, e.g. a delayed bloom of at each split (where p = 232, and p is the total number of predictor crops in some regions. variables). • Feature selection: The wide range of words consid- ered also introduces many irrelevant features. Therefore, 5.2 Results WEASEL+MUSE applies statistical feature selection to re- move irrelevant words from each class. These may be a Table 2 presents the class-wise accuracies and overall accuracy result of erroneous information introduced by the image (weighted and simple average F1-scores) on the test samples for capture or preprocessing. each of the three methods. Overall, the combined model had the highest F1-score of 91.1% followed by WEASEL+MUSE with The resulting feature vector is highly discriminative and contains 89.6% and RF with 89.0% (weighted averages). Thus, the choice of words that are characteristic for each class, which allows the use the classifier had a relatively small effect on the overall accuracy. of fast logistic regression classifier. To perform our analysis, we However, classes with high F1-scores were also better represented used the JAVA implementation available from [29]. in the training and test samples. The effect on overall accuracy was therefore higher when sample sizes were ignored. 4.5 Ensemble Approach When looking at the results in detail, all classifiers showed To understand whether there is value in combining the time- high F1-scores of about 90% for all but two classes: “other built- series-based approach and the feature-based approach, we up” and “other crops”. For these classes, the accuracy was only build and tested a third model based on the ensemble of ~60% and ~50%, respectively. While RF showed a high precision, WEASEL+MUSE and RF. Both approaches output class probabili- WEASEL+MUSE had a higher recall and F1-score, indicating that ties for each pixel and select the class with the highest probability. the temporal profile improved the detection of challenging classes. Pixels belonging to a unique spectral class may be associated with However, the largest improvement in accuracy was obtained from high class probabilities, whereas pixels with less distinct class combining the two classifiers, which further improved the F1- membership may have more equally distributed probabilities, score by up to 20 percentage points for the “other crops” class. 13 Random Forests (RF) WEASEL+MUSE Combined Land cover Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Urban 86.7% 91.1% 88.9% 87.6% 91.6% 89.6% 88.9% 91.6% 90.2% Other built-up 75.0% 52.6% 61.9% 73.8% 57.7% 64.8% 71.9% 61.6% 66.3% Forests 86.1% 93.1% 89.4% 87.9% 91.7% 89.8% 90.2% 92.5% 91.3% Sparse Vegetation 92.3% 93.5% 92.9% 90.5% 93.3% 91.8% 93.9% 94.6% 94.2% Barren 96.2% 95.3% 95.7% 96.0% 92.6% 94.3% 96.7% 95.9% 96.3% Grassland 83.1% 84.5% 83.8% 85.1% 86.5% 85.8% 87.1% 89.1% 88.1% Sugarcane 94.1% 94.1% 94.1% 94.1% 94.9% 94.5% 94.7% 95.3% 95.0% Other crops 95.2% 34.5% 50.6% 81.6% 45.3% 58.3% 85.3% 58.2% 69.2% Water 90.0% 78.1% 83.6% 91.3% 84.0% 87.5% 91.0% 86.6% 88.7% Weight. Average 89.4% 89.4% 89.0% 89.5% 89.6% 89.6% 91.1% 91.2% 91.1% Average 88.7% 79.6% 82.3% 87.5% 82.0% 84.0% 88.9% 85.1% 86.6% Table 2: Class-wise accuracy and overall accuracy based on test sample for RF, WEASEL+MUSE, and our combined model. Vegetation . Sugarcane Grassland Accuracy Sparse built-up Users Forests Other Barren Urban Water Other crops Urban 7274 544 118 59 24 79 90 142 59 86.7% Other built-up 163 840 12 41 4 8 16 5 31 75.0% Forests 273 60 7448 146 30 282 73 237 106 86.1% Sparse Vegetation 50 33 195 7494 226 33 8 9 72 92.3% Barren 5 7 15 209 6164 2 0 3 4 96.2% Grassland 123 60 137 48 2 2382 38 75 2 83.1% Sugarcane 69 20 50 0 0 32 3669 52 6 94.1% Other crops 1 1 11 0 0 0 1 276 0 95.2% Water 26 31 14 17 18 1 2 2 998 90.0% Producers Accuracy 91.1% 52.6% 93.1% 93.5% 95.3% 84.5% 94.2% 34.5% 78.1% Table 3: Confusion matrix from Random Forests classification. The confusion matrix for the RF (Table 3) gives a detailed 125 between the values of the same symbol. This noise reduc- picture. “other build-up” is often confused with “urban”, and tion is useful for applications like gesture recognition [25], but it “other crops” is often confused with the “other forests”, “urban” seems to be too aggressive here. or “grassland”. This might be a result of an under-representation of these land cover classes in the dataset, as these two classes are 5.3 Sensitivity Analysis the ones with the lowest number of instances (Table 1). On aver- To better judge the performance of the two classification methods, age WEASEL+MUSE required 5.8 ms for a single pixel prediction, we tested their sensitivity regarding the weighted F1-score to as a result of the feature extraction and selection phases prior to varying sizes of the training sample (Figure 3). Specifically, it classification. The RF took 2.7 ms on average per pixel for classifi- was unclear whether the superior performance of the time series cation. For the 7.5 Mio pixels of the study area, this translates into classifier could be replicated with small sample sizes. Starting a total, single-CPU runtime of about 5.4 hours for RF compared to with a random sample of 10% of the original training sample, we 12.2 hours with WEASEL+MUSE. WEASEL+MUSE obtained very iteratively increased the training sample size up to 100% , and promising accuracy for many classes. However we observed a tested all models using the test samples. The results show that limitation of WEASEL+MUSE and all Bag-of-Pattern approaches WEASEL+MUSE scored consistently higher than RF across all when applied in the context of land cover classification, namely sample sizes. The absolute difference was constant, indicating the discretization step introduced for noise reduction and to ob- that both approaches were equally sensitive to the sample size. tain words from real-valued sequences. For discretization, the value range is divided into bins, and each one is associated with 6 CONCLUSION a label. However, only a limited number of symbols, typically Our objective was to test a time-series-based classification ap- between 4 to 8, can be used to discretize the value range with- proach (WEASEL+MUSE) to earth observation time series for land out negatively impacting accuracy [17]. For the Landsat data cover classification and conventional machine learning approach the spectral range is between 0 and 1000 and the absolute dif- (Random Forests). We reported results of a series of experiments ference between pixels is important for classification. However, using 23 Landsat 8 images collected in 16 day-intervals over Re- after discretization using 8 symbols (i.e., a = [000 − 125] and union Island. The reference dataset consisted of a total of 81714 b = [151 − 250], . . . ) there can be a difference between 0 up to pixels distributed over 9 classes. Our key finding is that the use of temporal information improved the classification accuracy, 14 satellites. ISPRS Journal of Photogrammetry and Remote Sensing 103 (2015), 115–128. [5] L. Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32. [6] L. Bruzzone and B. Demir. 2014. A review of modern approaches to classifica- tion of remote sensing data. In Land Use and Land Cover Mapping in Europe. Springer, 127–143. [7] J.R.W. Ellefsen, L. Gaydos. 1974. Computer Aided Mapping of Land Use ERTS Multispectral Scanner Data. First Pan American Congress on Photogrammetry. [8] K. Grogan, D. Pflugmacher, P. Hostert, R. Kennedy, and R. Fensholt. 2015. Cross- border forest disturbance and the role of natural rubber in mainland Southeast Asia using annual Landsat time series. Remote Sensing of Environment 169 (2015), 438–453. [9] M. Hansen and T. Loveland. 2012. A review of large area monitoring of land cover change using Landsat data. Remote sensing of Environment 122 (2012), 66–74. [10] G. Hay, G. Castilla, M. Wulder, and J. Ruiz. 2005. An automated object-based approach for the multiscale image segmentation of forest scenes. International Journal of Applied Earth Observation and Geoinformation 7, 4 (2005), 339–359. [11] C. Homer, C. Huang, L. Yang, B. Wylie, and M. Coan. 2004. Development of a 2001 national land-cover database for the United States. Photogrammetric Engineering & Remote Sensing 70, 7 (2004), 829–840. Figure 3: Overall accuracy of WEASEL+MUSE and Ran- [12] P. Hostert, P. Griffiths, S. van der Linden, and D. Pflugmacher. 2015. Time series analyses in a new era of optical satellite data. In Remote Sensing Time dom Forests (RF) classification with varying train sample Series. Springer, 25–41. sizes, i.e., 10%-100% of the training samples to build the [13] B. Jakimow, P. Griffiths, S. van der Linden, and P. Hostert. 2017. Mapping model and 100% to estimate accuracy. pasture management in the Brazilian Amazon from dense Landsat time series. Remote Sensing of Environment (2017). [14] K. Jia, X. Wei, X. Gu, Y. Yao, X. Xie, and B. Li. 2014. Land cover classification using Landsat 8 operational land imager data in Beijing, China. Geocarto International 29, 8 (2014), 941–951. but not for all land cover classes. The improvement in accuracy [15] T. Kuemmerle, O. Chaskovskyy, J. Knorn, V. Radeloff, I. Kruhlov, W. Keeton, was highest for rare and difficult classes. For these, a combined and P. Hostert. 2009. Forest cover change and illegal logging in the Ukrainian classifier was able to improve the F1-score even further. We used Carpathians in the transition period from 1988 to 2007. Remote Sensing of Environment 113, 6 (2009), 1194–1207. Random Forests because they are widely used and robust, but [16] A. Liaw, M. Wiener, et al. 2002. Classification and regression by randomForest. the ensemble would also work with other classifiers that output R news 2, 3 (2002), 18–22. [17] J. Lin, R. Khade, and Y. Li. 2012. Rotation-invariant similarity in time series class probabilities. Regarding classification times, the Random using bag-of-patterns representation. Journal of Intelligent Information Systems Forests approach was twice as fast as WEASEL+MUSE, as Ran- 39, 2 (2012), 287–315. dom Forests do not require feature extraction or selection, as [18] Nicola Di Mauro, Antonio Vergari, Teresa Maria Altomare Basile, Fabrizio G. Ventola, and Floriana Esposito. 2017. End-to-end Learning of Deep Spatio- opposed to time-series-based approaches. temporal Representations for Satellite Image Time Series Classification. In Further research is needed to understand how Proceedings of the ECML/PKDD Discovery Challenges. MUSE+WEASEL scales over larger areas, e.g. continental [19] S. Powell, W. Cohen, Z. Yang, J. Pierce, and M. Alberti. 2008. Quantification of impervious surface in the Snohomish water resources inventory area of scale. The complexity of land cover processes and their western Washington from 1972–2006. Remote Sensing of Environment 112, 4 spectral-temporal patterns will probably grow with increasing (2008), 1895–1908. [20] J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), area size. From an application point of view, our presented 81–106. method may be of particular interest for mapping agricultural [21] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, land use patterns. Agricultural land can be highly dynamic and J. Zakaria, and E. Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 2012 ACM spectrally variable throughout the year [13]. This land use class SIGKDD International Conference on Knowledge Discovery and Data Mining. is therefore likely to benefit from time-series based approaches. ACM, 262–270. Future research could test the performance of our time-series [22] S. Running. 2008. Climate change - Ecosystem disturbance, carbon, and climate. Science 321, 5889 (2008), 652–653. based method for classifying a broader range of crop types and [23] P. Schäfer. 2015. The BOSS is concerned with time series classification in cropping cycles. The presented approaches essentially disregard the presence of noise. Data Mining and Knowledge Discovery 29, 6 (2015), 1505–1530. the spatial neighbourhood of a pixel. Experiments have shown [24] P. Schäfer and U. Leser. 2017. Benchmarking Univariate Time Series Classifiers. that including spatial information can improve classification In BTW 2017. 289–298. accuracies, similar to the results reported in [18]. [25] P. Schäfer and U. Leser. 2017. Fast and Accurate Time Series Classification with WEASEL. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017), 637–646. ACKNOWLEDGEMENTS [26] P. Schäfer and U. Leser. 2017. Multivariate Time Series Classification with WEASEL+MUSE. ArXiv e-prints (Nov. 2017). arXiv:1711.11343 This work was partly supported by the German Federal Ministry [27] C. Senf, P. Leitão, D. Pflugmacher, S. van der Linden, and P. Hostert. 2015. of Education and Research through the GeoMultiSens project Mapping land cover in complex Mediterranean landscapes using Landsat: Im- proved classification accuracies from integrating multi-seasonal and synthetic (grant no. 01IS14010B). imagery. Remote Sensing of Environment 156 (2015), 527–536. [28] TiSeLaC : Time Series Land Cover Classification Challenge. 2017. https: REFERENCES //sites.google.com/site/dinoienco/tiselc. (2017). [29] WEASEL+MUSE implementation. 2016. https://github.com/patrickzib/SFA/. [1] H. Bagan, Q. Wang, M. Watanabe, Y. Yang, and J. Ma. 2005. Land cover (2016). classification from MODIS EVI times-series data using SOM neural network. [30] M. Wistuba, J. Grabocka, and L. Schmidt-Thieme. 2015. Ultra-fast shapelets International Journal of Remote Sensing 26, 22 (2005), 4999–5012. for time series classification. arXiv preprint arXiv:1503.05018 (2015). [2] A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. 2016. The Great Time [31] H. Yin, D. Pflugmacher, R. Kennedy, D. Sulla-Menashe, and P. Hostert. 2014. Series Classification Bake Off: An Experimental Evaluation of Recently Pro- Mapping annual land use and land cover changes using MODIS time series. posed Algorithms. Extended Version. Data Mining and Knowledge Discovery IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2016), 1–55. 7, 8 (2014), 3421–3427. [3] M. Baydogan and G. Runger. 2016. Time series representation and similarity [32] Z. Zhu and C. Woodcock. 2014. Continuous change detection and classification based on local autopatterns. Data Mining and Knowledge Discovery 30, 2 (2016), of land cover using all available Landsat data. Remote sensing of Environment 476–509. 144 (2014), 152–171. [4] A. Belward and J. Skøien. 2015. Who launched what, when and why; trends in global land-cover observation capacity from civilian earth observation 15