<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Temporal and spatial approaches for land cover classification.</article-title>
      </title-group>
      <abstract>
        <p>This paper describes solution for Time Series Land Cover Classification Challenge (TiSeLaC). Using features extracted from satellite images time series (SITS) each pixel corresponding to 30m⇥ 30m area can be classified to one of general class (urban area, forest, water, etc.). The following approaches implemented and evaluated: classical data mining multiclass prediction, local context embedding and extracting shapes of temporal dynamics. Also the di↵erent cross-validation schemes considered to evaluate performance of approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>image classification</kwd>
        <kwd>satellite images time series</kwd>
        <kwd>land cover classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The Time Series Land Cover Classification Challenge (TiSeLaC)1 provided with
time series of processed 23 Landsat 8 images acquired in 2014 above the Reunion
Island (2866 X 2633 pixels at 30 m spatial resolution), provided at level 2A2.
Among the many land cover classes the following 9 most important classes are
retained for task:
– UltraBlue
– Blue
– Green
– Red
– NIR (Near-infrared)
– SWIR1 (Shortwave Infrared 1)
– SWIR1 (Shortwave Infrared 2)
– NDVI (Normalized Di↵erence Vegetation Index)
– NDWI (Normalized Di↵erence Water Index)
– BI (Brightness Index)
Also pixel coordinates (roughly related to longitude and latitude) for each point
provided.</p>
      <p>Let use n f eatures = 10 to denote the number of original features provided and
n periods = 23 to denote number of consecutive images in time series.
In di↵erence with other competitions 3 for classifying object on aerial or satellite
images, there the full images were given, here only part of pixels are proposed.
Also in other tasks the classification is applied on segments or whole images,
while pixel-wise classification is proposed for this challenge.</p>
      <p>In next sections temporal approach (section 2) and spatial context approach
(section 3) are described. Also all approaches validated on di↵erent validation
schemes which corresponds to di↵erent methods for data preparation (details in
section 4). Finally in section 5 the scores for every approach listed and in section
6 an advantages and restrictions of above approaches analyzed.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Temporal approaches</title>
      <p>
        Here the two methods applied: one is to encode temporal shape of every
feature’s by single category, then combine them to n f eatures features, another is
to encode each feature set snapshot to single category, then combine them to
n periods features. Both encodings were implemented with clustering.
Clustering algorithm MiniBatchKMeans is used with implementation from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] based on
paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. While the clustering algorithm usually have many parameters to be
tuned, here only di↵erent number of clusters tried.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Clustering temporal shapes</title>
        <p>Every sequence consisting of n periods single channel measurement is treated as
features and encoded with single cluster number. So what there is n f eatures
of new categorical features given. As it can be seen in Fig. 1 for n clusters = 10
some of temporal shapes are well separated.
3 http://dataring.ru/competitions/fpi_sk_competition/
https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
https://www.kaggle.com/c/planet-understanding-the-amazon-from-space
From the other side not temporal shape but single period set of features were
clustered. Here the task is to find clusters in n f eatures-dimensional space. After
combining all clusters for each period the new n periods categorical features are
given. A few examples of such clusters are shown on Fig. 2</p>
        <p>In each method clusters are combined and passed as categorical features to
classifier (see Section 5).
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Spatial context models</title>
      <sec id="sec-3-1">
        <title>Neighbors embedding</title>
        <p>Using spatial context is crucial approach in image classification tasks especially
in aerial and satellite images. Despite the data is sparse as we here usually have
no immediate neighbors pixels available, the following steps for searching the
nearest neighbors and calculation of distribution properties among pixel area is
applied:
1. For every point
– search for the n neighbors nearest neighbors ordered by L2 distance
– Fetch neighbors features: Fc,t,k, where
• c 2 1, . . . , n f eatures - number of features,
• t 2 1, . . . , n periods - time,
• k 2 1, . . . , n neighbors - number of neighbor
2. The point itself features also added
3. Train classifier on new dataset with n f eatures⇤ n periods⇤ (n neighbors+1)
features.</p>
        <p>See section 5 for results.
3.2</p>
        <p>k-NN model
Here the non-parametric classification is applied to predict class based on nearest
neighbors using coordinates provided.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Cross-validation schemes</title>
      <p>The good cross-validation scheme is crucial part to construct robust and reusable
classification model. Here a few schemes were designed to score every approach.
Firstly the original proposed scheme with evenly distributed train and validation
points are used. Further, to check how size of labeled samples a↵ected accuracy
of prediction the sub-sampling schemes are used (see Fig. 3):
– sub-sample train with ratio 0.7
– sub-sample train with ratio 0.4
– sub-sample train with ratio 0.1</p>
      <p>Another scheme (see Fig. 4) is pursued the spatial separation of train and
validation points and implemented by follows:
– split coordinate space to 100 (10x10) equal rectangles
– randomly split rectangles between train and validation</p>
      <p>In every schemes the size of validation set is the same and is equal to 1/3 of
total sample.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental results</title>
      <p>For scoring prediction F1score with option ’weighted’ is used:</p>
      <p>F 1score weighted =</p>
      <p>PC
c=1 F 1score(pc, c) ⇤ | c|</p>
      <p>N
where C count of classes, c = 1, . . . , C is classes, pc is predictions for objects
from class c, N = PC</p>
      <p>c=1 |c| is size of test set.</p>
      <p>
        The following cross-validation results are shown in Table 1:
– Benchmark For the simplest benchmark the Extremely Randomized Trees
Classifier (ETC) ( [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], with implementation from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) is trained without any
preprocessing of original features.
– Temporal feature clusters, for ETC the best result for n clusters=60
shown, also plot scores for di↵erent number of clusters is shown on Fig. 5
– Snapshot clusters, for ETC the best result for n clusters=40 shown, also
plot scores for di↵erent number of clusters is shown on Fig. 6
– Neighbors embedding for ETC the result for n neighbors for 1,4,9 is
shown.
– Coordinate neighbors using Nearest Neighbors Classifier from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the
results for n neighbors for 1,2,10 is shown.
Despite the clustering discovered many patterns of seasonal reflectance
dynamics, it shows worst results even in comparison with benchmark. Although
di↵erent clustering models and distance metrics may help. Also the best scores for this
methods achieved at high number of clusters (⇠ 60), hence it looks like during
clustering some important information is loosed.
      </p>
      <p>The spatial approaches both outperform benchmark, and the 1-Nearest
Neighbors Classifier show the best result among all other approaches. This approach
very robust to train size decreasing and trained only on 10% of points it
outperforms other approaches that employs full data set.</p>
      <p>Also from spatial approaches the Nearest Embedding with 9 neighbors shows
best result on cross-validation with rectangles, where k-Nearest Neighbors
Classifier is not applicable. So this method can be recommended to use on completely
new area for land cover classification.
Method Classifier Original Rectangles Train 70% Train 40% Train 10%
Benchmark ETC30 0.8893 0.7797 0.8824 0.8719 0.8409
Cluster features time series ETC30 0.8185 0.7063 0.7846 0.7746 0.7546
Cluster snapshots ETC30 0.8294 0.7314 0.8223 0.8116 0.7816
Neighbors embedding 1 ETC30 0.9038 0.7975 0.8964 0.8834 0.8496
Neighbors embedding 4 ETC30 0.9056 0.8141 0.8968 0.8883 0.8569
Neighbors embedding 9 ETC30 0.9041 0.8171 0.8966 0.8862 0.8528
Coordinates only 1-NN 0.9850 NA 0.9787 0.9663 0.9071
Coordinates only 2-NN 0.9789 NA 0.9716 0.9553 0.8875
Coordinates only 10-NN 0.9523 NA 0.9393 0.9158 0.8335</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Sculley</surname>
          </string-name>
          .
          <article-title>Web-scale k-means clustering</article-title>
          .
          <source>WWW 2010: Proceedings of the 19th Annual International World Wide Web Conference</source>
          .,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Geurts</surname>
          </string-name>
          , Damien Ernst, and
          <string-name>
            <given-names>Louis</given-names>
            <surname>Wehenkel</surname>
          </string-name>
          .
          <article-title>Extremely randomized trees</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>63</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>42</lpage>
          ,
          <year>Apr 2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>