=Paper=
{{Paper
|id=Vol-2083/paper-02
|storemode=property
|title=Classifying Land Cover from Satellite Images Using Time Series Analytics
|pdfUrl=https://ceur-ws.org/Vol-2083/paper-02.pdf
|volume=Vol-2083
|authors=Patrick Schäfer,Dirk Pflugmacher,Patrick Hostert,Ulf Leser
|dblpUrl=https://dblp.org/rec/conf/edbt/0001PHL18
}}
==Classifying Land Cover from Satellite Images Using Time Series Analytics==
Classifying land cover from satellite images using time series
analytics
Patrick Schäfer Dirk Pflugmacher Ulf Leser
Computer Science Department Patrick Hostert Computer Science Department
Humboldt-Universität zu Berlin Geography Department Humboldt-Universität zu Berlin
Germany Humboldt-Universität zu Berlin leser@informatik.hu-berlin.de
patrick.schaefer@hu-berlin.de [dirk.pflugmacher,patrick.hostert]
@geo.hu-berlin.de
ABSTRACT time, e.g. because of vegetation senescence or harvesting [27].
The Earth’s surface is continuously observed by satellites, lead- The task can be approached in different ways. In a typical base-
ing to large multi-spectral image data sets of increasing spatial line setting, the different measurements per pixel are used as
resolution and temporal density. One important application of independent features for a classical machine learning-based clas-
satellite data is the mapping of land cover and land use changes sifier, such as Naive Bayes or Decision Trees [20]. In this ap-
such as urbanization, deforestation, and desertification. This in- proach, the temporal order of the measurements is ignored as
formation should be obtained automatically, with high accuracy, all features are treated as orthogonal dimensions of the feature
and at the pixel level, which implies the need to classify millions space. An alternative method is to include the consideration of
of pixels even when only small regions are studied. Balancing the order of measurements by using methods from time series
runtime and accuracy for this task becomes even more challeng- analytics [2, 24]. Here, every pixel is considered as a temporally
ing with the recent availability of multiple time points per pixel, ordered (and aligned) series of measurements, and the specific
created by periodically performed satellite scans. In this paper changes (increasing or decreasing slopes, periodic changes etc.)
we describe a novel approach to classify land cover from series of of the measurements over time are analyzed to find common-
multi-spectral satellite images based on multivariate time series alities and to derive classification models. Previous works (see
analytics. The main advantage of our method is that it inherently Section 3) have shown that this can be advantageous as land cover
models the periodic changes (seasons, agriculture etc.) underly- are temporally variable and often follow characteristic temporal
ing many types of land covers and that it is comparably robust to patterns, such as those imposed by seasons. However, given the
noise. Compared to a classical feature-based classifier, our new enormous scale of the data to be classified, not only the accuracy
method shows a slightly superior overall accuracy, with an in- of an approach is important, but also the runtime performance
crease of up to 20% in accuracy for rare land cover classes, though has a critical role in any practical application.
at the cost of notably increased runtime. The highest accuracy is In this work, we evaluate the recently proposed multivariate
achieved by combining both approaches. time series classification algorithm WEASEL+MUSE [26] for land
cover classification using temporally dense, medium-resolution
satellite images. WEASEL+MUSE models multivariate time series
1 INTRODUCTION using the truncated Fourier transformation and discretizes mea-
Monitoring changes in land usage is an important area of research surements, both of which to reduce noise, builds a rich feature
as land cover is a key variable driving the Earth’s energy balance, space to capture also subsequences in the time series, is able to
hydrological and carbon cycle, and the provisioning of natural exploit similar temporal subsequences even when appearing at
resources and habitat [22]. Over the last three decades, satellite- very different offsets within a time series, and uses aggressive
based Earth Observation (EO) programs have made tremendous feature selection to remove irrelevant features and thereby speed-
progress in acquiring medium-resolution (10 − 100m) images up classification. Although WEASEL+MUSE was not developed
around the globe systematically and with increasing frequency specifically for land cover classification, many of its aspects fit
(revisit time). As a result, large volumes of medium-resolution nicely to the specificities of this domain, such as the inherent
satellite images are now available free of charge, enabling au- noise reduction and the exploitation of repetitive behaviour.
tomatic approaches to the identification of land usage and the We compare the prediction performance of WEASEL+MUSE
detection of land surface changes over large areas. For example, with the performance of an established and popular machine
the American Landsat 8 sensor images the Earth at 30-m spatial learning-based approach, Random Forests, using the same input
resolution every 16 days, and two European Sentinel-2 sensors ac- features on 23 Landsat 8 images collected in 16 day-intervals over
quire images with a revisit time of 5 days and a spatial resolution Reunion Island. The study region covers an area of 2866x2633
of 10 − 20 meters, which amounts to roughly 60 measurements pixels at 30~m spatial resolution. As reference dataset for model
for more than 300 Billions pixels (excluding oceans) in a year. training and validation, we used a sample of 81714 pixels that
The free availability of medium-resolution satellite image time had been manually classified into 9 land cover classes.
series has spawned new possibilities for mapping land cover [9]. Our results indicate that time-series-based algorithms improve
In the past, classification approaches operated on single images or land cover classification accuracy compared to non-temporal al-
stacks of images (i.e. composite classification). Multi-date classifi- gorithms. Our time series algorithm WEASEL+MUSE achieved
cation approaches exploit the notion that land cover can vary over higher classification accuracies than Random Forests. The im-
provement in accuracy was most notable with rare and/or difficult
© 2018 Copyright held by the owner/author(s). Published in the Workshop classes. Here, class-wise accuracies increased by 8 and 3 percent-
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted
age points, respectively. Overall accuracy improved by 1%-point
under the terms of the Creative Commons license CC-by-nc-nd 4.0. owing to the fact that the dominant classes were less affected by
10
the choice of algorithm. Interestingly, Random Forests captures applications and reference data. However, there has been a trend
different signals in the data than the time-series-based approach, away from parametric statistical models to machine learning to
as a simple Ensemble of both approaches further improved clas- deal with the complexity of input data and class legends.
sification accuracy. However, this increased accuracy comes at It has only recently become feasible to build land cover map-
the cost of an increased runtime. Thus, we will focus future work ping algorithms that exploit the temporal domain of entire pixel
on improving the runtime of our method without loosing the time series with medium resolution data [12]. To this end, satel-
advancements in prediction performance we observed. lite images typically first undergo a series of pre-processing steps,
The rest of this paper is organized as follows: Section 2 pro- including the correction of atmospheric effects, geometric align-
vides background on land cover classification using satellite im- ment and cloud and cloud-shadow masking. Once these steps
ages. Section 3 presents the current state of the art in land cover are finished, spectral values of pixels can be traced over time to
mapping and time series classification. Section 4 describes the identify and detect land surface changes such as deforestation [8],
test dataset and the tested classification methods: Random Forests or urbanization [19].
and WEASEL+MUSE. Section 5 presents the experiments.
3 RELATED WORK
2 LAND COVER CLASSIFICATION FROM Although, time-series based classification (TSC) is a relatively
SATELLITE IMAGES new area in remote sensing, the topic itself has a long tradi-
tion and dozens of approaches exist (see [2, 24], for instance).
The classification of satellite images to extract information on
Time-series-based classifiers can broadly be categorized into two
land cover and land use has a long history. A significant turning
classes: Similarity-based methods use a similarity measure over
point in terrestrial Earth Observation was the launch of Landsat-1
sequences, such as Dynamic Time Warping (DTW), to perform a
in 1972 (then called Earth Resource Technology Satellite). For the
point-wise comparison of two time series. In contrast, feature-
first time, Landsat delivered systematic observations of the Earth
based TSC rely on comparing features extracted from the different
land surface for land monitoring, leading to new ways of machine-
time series, typically generated from their substructures.
assisted approaches for mapping land cover from space [7]. In the
These two approaches are also the basis for methods in multi-
1990s and early 2000s, international and national agencies started
variate time series classification (MTSC), where a time series is
to adopt operational land cover mapping programs with Landsat
not made of a single stream of values but by multiple streams,
and Landsat-like data, e.g. in Europe (CORINE), USA (NLCD),
each one usually synchronized in time. A popular similarity-
Canada (EOSD), and Australia (NCAS-LCCP). Even decades later,
based MTSC method is dynamic time warping (DTW) [21].
the Landsat program is still active today. Since February 2013,
Feature-based MTSC can be grouped into those methods that
Landsat 8 is taking images of the Earth at 30 m spatial resolution
build feature from so-called shapelets [30], which are short and
in 8 spectral bands every 16 days. The radiometric quality and
maximally discriminative subsequences of the time series, and
spectral resolution has greatly improved since the early satellites,
methods using the bag-of-patterns (BOP) approach [3]. The stan-
and because of new global acquisition strategies and on-board
dard BOP model [17] break up a time series into windows, rep-
storage and download capabilities, so has the sheer number of
resent the corresponding subsequences within each window as
available images. The number of medium spatial resolution (10 −
discrete features (or words), and finally derives a classifier from
100 m) sensors has been increasing [4], thus dense time series of
the frequencies of the words. Different approaches to TSC (and
medium resolution are available for many parts of the globe.
MTSC) differ not only in their accuracy on different data sets, but
Multi-spectral sensors like Landsat record the sun’s energy
also in their runtime for classification, which is a critical issue
reflected by a surface in a few distinct spectral wavelengths
when it comes to very large data sets as is the case of satellite
(bands), e.g. blue, green, red in the visible spectrum (400 nm to
images. For instance, we recently evaluated the runtime of 12
700 nm), near infrared (700 to 1100 nm), and short-wave infrared
different state-of-the-art methods for (univariate) TSC and found
(1100 to 3000 nm). Since land surfaces with different chemical and
differences of up to three orders of magnitude [24, 25].
structural properties often absorb and reflect sunlight differently
Most approaches to land cover classification rely on traditional
and wavelength-dependent, information on land cover can be
machine-learning methods (see previous section), and there have
derived from these spectral bands. For example, water absorbs
been only a few prior studies on using time series information. For
much of the near-infrared radiation, so these wavelengths are
example, [32] fitted harmonic functions to each satellite band and
useful for discerning land-water boundaries that are not obvious
used the fitted parameters as features in subsequent classification
in visible light. Similarly, green vegetation absorbs much of the
(which implies that it falls into the class of feature-based MTSC
incoming radiation in the red spectrum while reflecting about
methods). [14] found that including temporal information into a
50% of the radiation in the near-infrared spectrum.
model can have a bigger impact on classification accuracy than
Much of the past research on classification algorithms has fo-
the choice of the particular classification algorithm. New time-
cused on exploiting the spectral and spatial properties of land cov-
series-based algorithms are needed to leverage the predictive
ers, including artificial neural networks [1], decision trees [11],
potential of satellite time series images [6].
support vector machines [15], and spatial segmentation algo-
rithms [10]. Each algorithm has its strength and weakness with
4 METHODS
respect to: the distributional assumptions made about the data,
training requirements, computational complexity, and robust- 4.1 Description of Data
ness to overfitting, data noise, and errors in training data. Also For our analysis, we used a public dataset taken from the TiSeLaC
common to all algorithms is that the work is supervised, i.e., they (Time Series Land Cover Classification Challenge) [28]. This
need an independent reference data set (i.e., land cover infor- dataset consists of time series of 23 Landsat 8 images collected in
mation collected in the field or from air-photos) for training a 16 day-intervals over Reunion Island in 2014. Landsat data were
model. It is fair to say, that no single algorithm works best for all provided by the French Pôle Thématique Surfaces Continentales
11
1.0 Urban Areas 4.2 Time Series Analytics
Other built-up surfaces
0.8
Forests In general, a time series dataset contains N time series. Each time
Sparse Vegetation series is associated with a class label y from a predefined set of
Rocks and bare soil
0.6 Grassland labels Y. Time series classification (TSC) is the task of predicting a
Sugarcane crops class label for a time series whose label is unknown. A classifier
0.4 Other crops is a function that is learned from a set of labelled time series
Water
NDVI
(the training data), takes an unlabelled time series as input and
0.2
outputs a label. In this paper, each pixel should be labeled by one
0.0 of the nine reference land cover classes.
The 23 Landsat 8 images contain 10 time series of spectral
0.2 features each of length 23. At the pixel level, this data can be
interpreted as a multivariate time series (MTS).
0.4
0 5 10 15 20 A multivariate time series (MTS) T = {t 1 , . . . , tn } is an ordered
time steps sequence of n ∈ N streams ti = (ti,1 , . . . ti,m ) ∈ Rm , i.e., m
recorded values at each fixed time stamp. As we address MTS
Figure 1: Normalized Difference Vegetation Index (NDVI) generated from satellites with a fixed revisit time, we can safely
of 9 different land cover classes for our reference ignore the concrete time stamps. MTS are typically produced by
dataset [28]. sensors recording data over time like motion captures, gestures,
EEG signals, hand-written letters, sign language, or the multi-
spectral image data sets captured by satellites.
ID Land cover class Samples
1 Urban Areas 16000 4.3 Machine Learning Approach using
2 Other built-up surfaces 3236 Random Forests
3 Forests 16000
Random forests (RF) are an ensemble learning method widely used
4 Sparse Vegetation 16000
in Earth Observation (EO) for classifying land cover and land use
5 Rocks and bare soil 12942
(e.g. [31]). RF build ensembles of decision trees wherein each tree
6 Grassland 5681
is trained on randomly selected features of a bootstrapped train-
7 Sugarcane crops 7656
ing sample. Node splits are performed using a random subset of
8 Other crops 1600 predictor variables. Because of these random components, the
9 Water 2599 RF approach does not require tree pruning and is relatively in-
Table 1: Land cover classes and distribution of reference sensitive to overfitting [5]. To predict a class label, a RF classifies
samples [28]. the sample with all its decision trees and returns the mode of the
predicted classes. A RF approach to satellite image / pixel classi-
fication uses the concatenated spectral features of each pixel as
input (this is a vector of length 230 and the 2 coordinates, lon-
(THEIA), and atmospherically corrected, geometrically corrected, gitude and latitude). RF are a typical machine learning method
and cloud-masked with the Multi-sensor Atmospheric Correction which do not consider any order of the features. If time series
and Cloud Screening (MACCS) level 2A processor developed at the are used as features, each point in time is considered as an in-
French National Space Agency (CNES). Data pre-processing and dependent feature and the order of measurements is ignored.
temporal gap filling was performed using the iota21 Land Cover This implies that such methods are unable to reproduce serial
processor developed by CESBIO2 . For each time step and pixel, correlations or to detect temporal trends in the data.
ten spectral features were extracted, i.e., the seven reflectance
bands and three vegetation indices: the Normalized Difference 4.4 Land Cover Classification with
Vegetation Index (NDVI), the Normalized Difference Water Index WEASEL+MUSE
(NDWI), and the Brightness Index (BI). Figure 1 shows examples WEASEL+MUSE (Word ExtrAction for time SEries cLassification +
of NDVI time series for different land cover classes. MUltivariate Symbols and dErivatives) [26] is a state-of-the-art
Reference land cover data were derived from two publicly MTS classifier that is composed of the building blocks depicted
available dataset: the 2012 CORINE Land Cover (CLC) map and in Figure 2. It conceptually builds upon the univariate Bag-of-
the 2014 farmers’ graphical land parcel registration (Régistre Patterns model applied to each dimension. In the BOP model [23],
Parcellaire Graphique - RPG). The most significant classes for a time series is characterized by the frequency of occurrence of
the study area were retained, and a spatial processing (aided by substructures. BOP-based algorithms build a classification model
photo-interpretation) was performed to ensure consistency with by (1) extracting subsequences from a time series, (2) transform-
image geometry. Finally, a pixel-based random sampling of this ing each subsequence (of real values, the measurements) into a
dataset was applied to provide an almost balanced ground truth. discrete-valued word (a sequence of symbols over a fixed alpha-
The final reference dataset consisted of a total of 81714 pixels bet), (3) building a feature vector from word counts (histogram),
distributed over 9 classes (Table1). We split this reference dataset and (4) finally using a classification model from the machine
randomly in half for model training and testing. learning repertoire on these feature vectors.
Specifically, WEASEL+MUSE treats the pixel time series of the
1 http://tully.ups-tlse.fr/jordi/iota2.git 10 spectral features as a MTS with 10 dimensions. For each spec-
2 http://www.cesbio.ups-tlse.fr tral feature, subsequences using varying lengths are extracted,
12
Figure 2: WEASEL+MUSE Pipeline: Feature extraction, univariate BOP models and the multivariate BOP model.
approximated using the truncated Fourier transform, and dis- e.g. 49% vs 51%. To combine the two sets of class probabilities
cretized into words using equi-depth or equi-frequency binning. (2x9 class probabilities), we trained a RF model using the 18
A feature vector is built from the words (unigrams) and pairs class probabilities from the training dataset as predictors and the
of words (bigrams) to obtain order awareness. Finally, features corresponding land cover class labels as response.
are concatenated with the sensor id, to maintain a disjoint word
space for each dimension. This high dimensional feature space 5 EXPERIMENTS
is subsequently filtered using statistical feature selection (Chi-
squared test); finally, a logistic regression classifier is trained, 5.1 Experimental setup
assigning weights to characteristic word in each spectral band. Datasets: We evaluated our competitors using the described
Because WEASEL+MUSE is multivariate, the algorithm can Landsat dataset. The dataset was randomly split into 40857 train-
leverage the multi-spectral information of satellite time series. ing and 40857 test samples. The training dataset was used to
This is an advantage over univariate time series models that train each classifier. All reported accuracies are based on the test
operate on single indices (i.e. vegetation indices), as spectral- dataset.
temporal patterns may differ from sensor band to sensor band. Competitors: We performed a series of experiments:
The feature extraction and selection in WEASEL+MUSE make • Build RF on the training dataset and apply it to test dataset.
it interesting for land cover recognition: • Build WEASEL+MUSE on the training dataset and apply
• Features extraction: The words are derived from sub- it to test dataset.
sequences extracted at multiple window lengths in each • Build the combined model on the RF- and MUSE-predicted
spectral feature using the truncated Fourier transform class probabilities on the training dataset and apply it to
and discretization. The Fourier transform reduces noise the test dataset.
introduced by preprocessing, such as the cloud mask or Training: For WEASEL+MUSE we performed 10-fold cross-
geometric alignment, and the different window lengths validation on the training dataset to optimize the parameters
capture the seasonal trends at different time granularity. Bi- for the SFA quantization method (equi-depth or equi-frequency
grams can capture seasonal trends, e.g., higher intensities binning). To perform the RF analysis, we used the R statistical
in the spectrum in summer than in autumn. By extracting language and the RF package from [16]. We set the algorithm to
subsequences from the pixel stream, the classifier allows √
build 1000 trees and randomly sampled p variables as candidates
for small shifts in the time line, e.g. a delayed bloom of at each split (where p = 232, and p is the total number of predictor
crops in some regions. variables).
• Feature selection: The wide range of words consid-
ered also introduces many irrelevant features. Therefore,
5.2 Results
WEASEL+MUSE applies statistical feature selection to re-
move irrelevant words from each class. These may be a Table 2 presents the class-wise accuracies and overall accuracy
result of erroneous information introduced by the image (weighted and simple average F1-scores) on the test samples for
capture or preprocessing. each of the three methods. Overall, the combined model had
the highest F1-score of 91.1% followed by WEASEL+MUSE with
The resulting feature vector is highly discriminative and contains 89.6% and RF with 89.0% (weighted averages). Thus, the choice of
words that are characteristic for each class, which allows the use the classifier had a relatively small effect on the overall accuracy.
of fast logistic regression classifier. To perform our analysis, we However, classes with high F1-scores were also better represented
used the JAVA implementation available from [29]. in the training and test samples. The effect on overall accuracy
was therefore higher when sample sizes were ignored.
4.5 Ensemble Approach When looking at the results in detail, all classifiers showed
To understand whether there is value in combining the time- high F1-scores of about 90% for all but two classes: “other built-
series-based approach and the feature-based approach, we up” and “other crops”. For these classes, the accuracy was only
build and tested a third model based on the ensemble of ~60% and ~50%, respectively. While RF showed a high precision,
WEASEL+MUSE and RF. Both approaches output class probabili- WEASEL+MUSE had a higher recall and F1-score, indicating that
ties for each pixel and select the class with the highest probability. the temporal profile improved the detection of challenging classes.
Pixels belonging to a unique spectral class may be associated with However, the largest improvement in accuracy was obtained from
high class probabilities, whereas pixels with less distinct class combining the two classifiers, which further improved the F1-
membership may have more equally distributed probabilities, score by up to 20 percentage points for the “other crops” class.
13
Random Forests (RF) WEASEL+MUSE Combined
Land cover Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score
Urban 86.7% 91.1% 88.9% 87.6% 91.6% 89.6% 88.9% 91.6% 90.2%
Other built-up 75.0% 52.6% 61.9% 73.8% 57.7% 64.8% 71.9% 61.6% 66.3%
Forests 86.1% 93.1% 89.4% 87.9% 91.7% 89.8% 90.2% 92.5% 91.3%
Sparse Vegetation 92.3% 93.5% 92.9% 90.5% 93.3% 91.8% 93.9% 94.6% 94.2%
Barren 96.2% 95.3% 95.7% 96.0% 92.6% 94.3% 96.7% 95.9% 96.3%
Grassland 83.1% 84.5% 83.8% 85.1% 86.5% 85.8% 87.1% 89.1% 88.1%
Sugarcane 94.1% 94.1% 94.1% 94.1% 94.9% 94.5% 94.7% 95.3% 95.0%
Other crops 95.2% 34.5% 50.6% 81.6% 45.3% 58.3% 85.3% 58.2% 69.2%
Water 90.0% 78.1% 83.6% 91.3% 84.0% 87.5% 91.0% 86.6% 88.7%
Weight. Average 89.4% 89.4% 89.0% 89.5% 89.6% 89.6% 91.1% 91.2% 91.1%
Average 88.7% 79.6% 82.3% 87.5% 82.0% 84.0% 88.9% 85.1% 86.6%
Table 2: Class-wise accuracy and overall accuracy based on test sample for RF, WEASEL+MUSE, and our combined model.
Vegetation .
Sugarcane
Grassland
Accuracy
Sparse
built-up
Users
Forests
Other
Barren
Urban
Water
Other
crops
Urban 7274 544 118 59 24 79 90 142 59 86.7%
Other built-up 163 840 12 41 4 8 16 5 31 75.0%
Forests 273 60 7448 146 30 282 73 237 106 86.1%
Sparse Vegetation 50 33 195 7494 226 33 8 9 72 92.3%
Barren 5 7 15 209 6164 2 0 3 4 96.2%
Grassland 123 60 137 48 2 2382 38 75 2 83.1%
Sugarcane 69 20 50 0 0 32 3669 52 6 94.1%
Other crops 1 1 11 0 0 0 1 276 0 95.2%
Water 26 31 14 17 18 1 2 2 998 90.0%
Producers Accuracy 91.1% 52.6% 93.1% 93.5% 95.3% 84.5% 94.2% 34.5% 78.1%
Table 3: Confusion matrix from Random Forests classification.
The confusion matrix for the RF (Table 3) gives a detailed 125 between the values of the same symbol. This noise reduc-
picture. “other build-up” is often confused with “urban”, and tion is useful for applications like gesture recognition [25], but it
“other crops” is often confused with the “other forests”, “urban” seems to be too aggressive here.
or “grassland”. This might be a result of an under-representation
of these land cover classes in the dataset, as these two classes are 5.3 Sensitivity Analysis
the ones with the lowest number of instances (Table 1). On aver- To better judge the performance of the two classification methods,
age WEASEL+MUSE required 5.8 ms for a single pixel prediction, we tested their sensitivity regarding the weighted F1-score to
as a result of the feature extraction and selection phases prior to varying sizes of the training sample (Figure 3). Specifically, it
classification. The RF took 2.7 ms on average per pixel for classifi- was unclear whether the superior performance of the time series
cation. For the 7.5 Mio pixels of the study area, this translates into classifier could be replicated with small sample sizes. Starting
a total, single-CPU runtime of about 5.4 hours for RF compared to with a random sample of 10% of the original training sample, we
12.2 hours with WEASEL+MUSE. WEASEL+MUSE obtained very iteratively increased the training sample size up to 100% , and
promising accuracy for many classes. However we observed a tested all models using the test samples. The results show that
limitation of WEASEL+MUSE and all Bag-of-Pattern approaches WEASEL+MUSE scored consistently higher than RF across all
when applied in the context of land cover classification, namely sample sizes. The absolute difference was constant, indicating
the discretization step introduced for noise reduction and to ob- that both approaches were equally sensitive to the sample size.
tain words from real-valued sequences. For discretization, the
value range is divided into bins, and each one is associated with 6 CONCLUSION
a label. However, only a limited number of symbols, typically Our objective was to test a time-series-based classification ap-
between 4 to 8, can be used to discretize the value range with- proach (WEASEL+MUSE) to earth observation time series for land
out negatively impacting accuracy [17]. For the Landsat data cover classification and conventional machine learning approach
the spectral range is between 0 and 1000 and the absolute dif- (Random Forests). We reported results of a series of experiments
ference between pixels is important for classification. However, using 23 Landsat 8 images collected in 16 day-intervals over Re-
after discretization using 8 symbols (i.e., a = [000 − 125] and union Island. The reference dataset consisted of a total of 81714
b = [151 − 250], . . . ) there can be a difference between 0 up to pixels distributed over 9 classes. Our key finding is that the use
of temporal information improved the classification accuracy,
14
satellites. ISPRS Journal of Photogrammetry and Remote Sensing 103 (2015),
115–128.
[5] L. Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
[6] L. Bruzzone and B. Demir. 2014. A review of modern approaches to classifica-
tion of remote sensing data. In Land Use and Land Cover Mapping in Europe.
Springer, 127–143.
[7] J.R.W. Ellefsen, L. Gaydos. 1974. Computer Aided Mapping of Land Use ERTS
Multispectral Scanner Data. First Pan American Congress on Photogrammetry.
[8] K. Grogan, D. Pflugmacher, P. Hostert, R. Kennedy, and R. Fensholt. 2015. Cross-
border forest disturbance and the role of natural rubber in mainland Southeast
Asia using annual Landsat time series. Remote Sensing of Environment 169
(2015), 438–453.
[9] M. Hansen and T. Loveland. 2012. A review of large area monitoring of land
cover change using Landsat data. Remote sensing of Environment 122 (2012),
66–74.
[10] G. Hay, G. Castilla, M. Wulder, and J. Ruiz. 2005. An automated object-based
approach for the multiscale image segmentation of forest scenes. International
Journal of Applied Earth Observation and Geoinformation 7, 4 (2005), 339–359.
[11] C. Homer, C. Huang, L. Yang, B. Wylie, and M. Coan. 2004. Development of
a 2001 national land-cover database for the United States. Photogrammetric
Engineering & Remote Sensing 70, 7 (2004), 829–840.
Figure 3: Overall accuracy of WEASEL+MUSE and Ran- [12] P. Hostert, P. Griffiths, S. van der Linden, and D. Pflugmacher. 2015. Time
series analyses in a new era of optical satellite data. In Remote Sensing Time
dom Forests (RF) classification with varying train sample Series. Springer, 25–41.
sizes, i.e., 10%-100% of the training samples to build the [13] B. Jakimow, P. Griffiths, S. van der Linden, and P. Hostert. 2017. Mapping
model and 100% to estimate accuracy. pasture management in the Brazilian Amazon from dense Landsat time series.
Remote Sensing of Environment (2017).
[14] K. Jia, X. Wei, X. Gu, Y. Yao, X. Xie, and B. Li. 2014. Land cover classification
using Landsat 8 operational land imager data in Beijing, China. Geocarto
International 29, 8 (2014), 941–951.
but not for all land cover classes. The improvement in accuracy [15] T. Kuemmerle, O. Chaskovskyy, J. Knorn, V. Radeloff, I. Kruhlov, W. Keeton,
was highest for rare and difficult classes. For these, a combined and P. Hostert. 2009. Forest cover change and illegal logging in the Ukrainian
classifier was able to improve the F1-score even further. We used Carpathians in the transition period from 1988 to 2007. Remote Sensing of
Environment 113, 6 (2009), 1194–1207.
Random Forests because they are widely used and robust, but [16] A. Liaw, M. Wiener, et al. 2002. Classification and regression by randomForest.
the ensemble would also work with other classifiers that output R news 2, 3 (2002), 18–22.
[17] J. Lin, R. Khade, and Y. Li. 2012. Rotation-invariant similarity in time series
class probabilities. Regarding classification times, the Random using bag-of-patterns representation. Journal of Intelligent Information Systems
Forests approach was twice as fast as WEASEL+MUSE, as Ran- 39, 2 (2012), 287–315.
dom Forests do not require feature extraction or selection, as [18] Nicola Di Mauro, Antonio Vergari, Teresa Maria Altomare Basile, Fabrizio G.
Ventola, and Floriana Esposito. 2017. End-to-end Learning of Deep Spatio-
opposed to time-series-based approaches. temporal Representations for Satellite Image Time Series Classification. In
Further research is needed to understand how Proceedings of the ECML/PKDD Discovery Challenges.
MUSE+WEASEL scales over larger areas, e.g. continental [19] S. Powell, W. Cohen, Z. Yang, J. Pierce, and M. Alberti. 2008. Quantification
of impervious surface in the Snohomish water resources inventory area of
scale. The complexity of land cover processes and their western Washington from 1972–2006. Remote Sensing of Environment 112, 4
spectral-temporal patterns will probably grow with increasing (2008), 1895–1908.
[20] J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986),
area size. From an application point of view, our presented 81–106.
method may be of particular interest for mapping agricultural [21] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu,
land use patterns. Agricultural land can be highly dynamic and J. Zakaria, and E. Keogh. 2012. Searching and mining trillions of time series
subsequences under dynamic time warping. In Proceedings of the 2012 ACM
spectrally variable throughout the year [13]. This land use class SIGKDD International Conference on Knowledge Discovery and Data Mining.
is therefore likely to benefit from time-series based approaches. ACM, 262–270.
Future research could test the performance of our time-series [22] S. Running. 2008. Climate change - Ecosystem disturbance, carbon, and climate.
Science 321, 5889 (2008), 652–653.
based method for classifying a broader range of crop types and [23] P. Schäfer. 2015. The BOSS is concerned with time series classification in
cropping cycles. The presented approaches essentially disregard the presence of noise. Data Mining and Knowledge Discovery 29, 6 (2015),
1505–1530.
the spatial neighbourhood of a pixel. Experiments have shown [24] P. Schäfer and U. Leser. 2017. Benchmarking Univariate Time Series Classifiers.
that including spatial information can improve classification In BTW 2017. 289–298.
accuracies, similar to the results reported in [18]. [25] P. Schäfer and U. Leser. 2017. Fast and Accurate Time Series Classification
with WEASEL. Proceedings of the 2017 ACM on Conference on Information and
Knowledge Management (2017), 637–646.
ACKNOWLEDGEMENTS [26] P. Schäfer and U. Leser. 2017. Multivariate Time Series Classification with
WEASEL+MUSE. ArXiv e-prints (Nov. 2017). arXiv:1711.11343
This work was partly supported by the German Federal Ministry [27] C. Senf, P. Leitão, D. Pflugmacher, S. van der Linden, and P. Hostert. 2015.
of Education and Research through the GeoMultiSens project Mapping land cover in complex Mediterranean landscapes using Landsat: Im-
proved classification accuracies from integrating multi-seasonal and synthetic
(grant no. 01IS14010B). imagery. Remote Sensing of Environment 156 (2015), 527–536.
[28] TiSeLaC : Time Series Land Cover Classification Challenge. 2017. https:
REFERENCES //sites.google.com/site/dinoienco/tiselc. (2017).
[29] WEASEL+MUSE implementation. 2016. https://github.com/patrickzib/SFA/.
[1] H. Bagan, Q. Wang, M. Watanabe, Y. Yang, and J. Ma. 2005. Land cover (2016).
classification from MODIS EVI times-series data using SOM neural network. [30] M. Wistuba, J. Grabocka, and L. Schmidt-Thieme. 2015. Ultra-fast shapelets
International Journal of Remote Sensing 26, 22 (2005), 4999–5012. for time series classification. arXiv preprint arXiv:1503.05018 (2015).
[2] A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. 2016. The Great Time [31] H. Yin, D. Pflugmacher, R. Kennedy, D. Sulla-Menashe, and P. Hostert. 2014.
Series Classification Bake Off: An Experimental Evaluation of Recently Pro- Mapping annual land use and land cover changes using MODIS time series.
posed Algorithms. Extended Version. Data Mining and Knowledge Discovery IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
(2016), 1–55. 7, 8 (2014), 3421–3427.
[3] M. Baydogan and G. Runger. 2016. Time series representation and similarity [32] Z. Zhu and C. Woodcock. 2014. Continuous change detection and classification
based on local autopatterns. Data Mining and Knowledge Discovery 30, 2 (2016), of land cover using all available Landsat data. Remote sensing of Environment
476–509. 144 (2014), 152–171.
[4] A. Belward and J. Skøien. 2015. Who launched what, when and why; trends
in global land-cover observation capacity from civilian earth observation
15