=Paper=
{{Paper
|id=Vol-3207/paper8
|storemode=property
|title=Multimodal Crop Type Classification Fusing Multi-Spectral Satellite Time Series with Farmers Crop Rotations and Local Crop Distribution
|pdfUrl=https://ceur-ws.org/Vol-3207/paper8.pdf
|volume=Vol-3207
|authors=Valentin Barriere,Martin Claverie
|dblpUrl=https://dblp.org/rec/conf/cdceo/BarriereC22
}}
==Multimodal Crop Type Classification Fusing Multi-Spectral Satellite Time Series with Farmers Crop Rotations and Local Crop Distribution==
<pdf width="1500px">https://ceur-ws.org/Vol-3207/paper8.pdf</pdf>
<pre>
Multimodal Crop Type Classification Fusing Multi-Spectral
Satellite Time Series with Farmers Crop Rotations and Local
Crop Distribution
Valentin Barriere1 , Martin Claverie1
1
    European Commission’s Joint Research Center, Via Fermi, 2749, 21027 Ispra VA, Italy


                                             Abstract
                                             Accurate, detailed, and timely crop type mapping is a very valuable information for the institutions in order to create more
                                             accurate policies according to the needs of the citizens. In the last decade, the amount of available data dramatically increased,
                                             whether it can come from Remote Sensing (using Copernicus Sentinel-2 data) or directly from the farmers (providing in-situ
                                             crop information throughout the years and information on crop rotation). Nevertheless, the majority of the studies are
                                             restricted to the use of one modality (Remote Sensing data or crop rotation) and never fuse the Earth Observation data with
                                             domain knowledge like crop rotations. Moreover, when they use Earth Observation data they are mainly restrained to one
                                             year of data, not taking into account the past years. In this context, we propose to tackle a land use and crop type classification
                                             task using three data types, by using a Hierarchical Deep Learning algorithm modeling the crop rotations like a language
                                             model, the satellite signals like a speech signal and using the crop distribution as additional context vector. We obtained very
                                             promising results compared to classical approaches with significant performances, increasing the Accuracy by 5.1 points in a
                                             28-class setting (.948), and the micro-F1 by 9.6 points in a 10-class setting (.887) using only a set of crop of interests selected
                                             by an expert. We finally proposed a data-augmentation technique to allow the model to classify the crop before the end of the
                                             season, which works surprisingly well in a multimodal setting.

                                             Keywords
                                             Remote Sensing, Farmer’s Rotations, Multimodal System, Hierarchical Model


1. Introduction                                                                                                       Such approaches, as the one proposed in this study, are
Timely and accurate crop type mapping provides valuable                                                               based on model trained with past seasons and applied on
information for crop monitoring and productions forecast                                                              the current one, plus we proposed a data-augmentation
[1]. In-season crop type mapping can serve not only                                                                   method to obtain satisfying results earlier in the season.
to better estimate the crop areas, but also to improve
                                                                                                                      Earth Observation-based crop type mapping Ma-
the yield forecasting by using crop-type specific models.
                                                                                                                      chine learning classification methods have been widely
Crop type mapping is thus a major information of the
                                                                                                                      tested to derive crop type map from remote sensing data.
crop monitoring systems focusing to in-season forecast
                                                                                                                      Among the various methods, Random Forest algorithm
of the crop production.
                                                                                                                      has proved its capacity to accurately identify crop type,
   The high-spatial resolution time series enables to deter-
                                                                                                                      accounting for large and non parametric data set [3].
mine crop type at a sub-parcel level in most agricultural
                                                                                                                      Since 2015 and the launch of the first satellite of the
areas. Most of the remote sensing classification systems
                                                                                                                      Copernicus Sentinel-2 (S2) constellation, the perspective
relies on supervised techniques, requiring in-situ crop
                                                                                                                      for crop type mapping at large scale has changed. The
identification survey. If the survey data are provided
                                                                                                                      high spatial and temporal resolution of S2 offers indeed
within the season, some systems [2] are designed to pre-
                                                                                                                      an appropriate data set to distinguish crop type, based on
dict crop type along the season with a given uncertainty,
                                                                                                                      the spectral and temporal signals, at parcel or sub-parcel
even if the crop cycle is on-going; such surveys data are
                                                                                                                      level in most agricultural region. Taking benefit of this
expensive because of the need of labels from the cur-
                                                                                                                      capacity, some operational systems have been expended
rent year to train a model, difficult to achieve at large
                                                                                                                      [4, 2, 5], combining Earth Observation (EO) data, in situ
scale and in most cases delivered after the cropping sea-
                                                                                                                      observations and classifier algorithm to deliver crop type
son. There is a high demand for crop type mapping that
                                                                                                                      maps at regional, country scale or continental scale [6].
does not rely on survey data from the on-going season.
                                                                                                                      Crop type mapping using Deep-Learning method
CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth
                                                                                                                      The recent progresses in deep-learning benefit the crop
Observation, July 25, 2022, Vienna, Austria
$ valentin.barriere@ec.europa.eu (V. Barriere);                                                                       type mapping applications. In [7], the authors are classi-
martin.claverie@ec.europa.eu (M. Claverie)                                                                            fying crop types at the parcel-level, using the data from
 0000-0002-0877-7063 (V. Barriere); 0000-0002-0877-7063                                                              the French Brittany during the season 2017. The authors
(M. Claverie)                                                                                                         have compared a Transformer-Encoder [8] and a Recur-
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     rent Neural Network of type Long-Short-Term-Memory
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
(LSTM) [9]. They obtain comparable results between              use of crop rotations and satellite time-series data over
the Transformers and the LSTM, obtaining best accuracy          several years: [18]. They present a methodology to de-
(0.69) for the former and macro-F1 (0.59) for the latter.       rive near real time Cropland Data Layer over major US
   In [10], the authors have designed a crop classifier         agricultural states. The methodology is nonetheless re-
at the parcel-level using S2 and compared several ap-           stricted to a limited number of crop types and the use
proaches to model the signal, comprising a Transformer          of Random Forest classifier, while the recent progress in
and a LSTM. They obtain respective overall accuracies           deep learning shows tremendous improvements in such
between 0.85 and 0.92 using the LSTM depending on the           data mining problem.
number of classes considered. A similar approach has
been run by [11] on 40k Central Europa parcels using            Contributions We propose to model both the crop ro-
S2. They proposed a new early classification mechanism          tations and the S2 time series signal in a multimodal way
in order to enhance a classical model with an additional        using a hierarchical Long-short-term-memory (LSTM).
stopping probability based on the previously seen infor-        The contribution is unique in term of conception as no
mation.                                                         work has been proposed fusing the large amount of tem-
   Finally, [12] are using the same technique developed in      porally fine-grained EO data with crop rotation analysis
[13], where they tackle the task of crop classification at      in an advanced deep learning method. The crop rotations
the pixel level, i.e. accounting for the spatial variation to   and the S2 time series were enhanced by the use of the
detect parcels boundaries. They are using a CNN-LSTM            crop distributions of the neighborhood fields picked from
network on S2 images to classify 17 types of crops.             previous year. The crop rotations are modeled over the
                                                                year as words would be in a language model [19], helped
Modeling the crop rotation sequences Crop rota-                 by the S2 time-series data that are modeled as if it was the
tion is a widely-used agronomic technique for sustainable       prosody of the speaker. Finally, the high-level features
farming, preserving the long term soil quality. Good un-        we add on the last layer of the network could be seen as
derstanding and design of crop rotation are essentials for      the distribution of the words used by our speaker. Finally,
sustainability and to mitigate the variability of agricul-      we also propose a data-augmentation technique for the
tural productivity induced by climate change. The crop          in-season classification, by randomly cropping the end of
rotation depends on the farmer management decision,             the RS time-series data. It allows to learn a model able to
but some good practices are shared, enabling to model the       classify the type of crop without the whole time-series,
crop rotation patterns [14]. They remains nonetheless           hence before the end of the season.
complex and non stable in time; changes may be related
to, e.g. economic consideration (commodities price) or          2. Methodology
administrative regulation (e.g. subsidies changes). Expert
knowledge based models are thus very limited and rarely         2.1. Dataset
accurate over large areas and long periods. Alternatively,      The study is focused on data acquired over The Nether-
estimation of the crop sequence probabilities without a         lands, covers the period 2009-2020 for the crop type label-
priori using survey data and hidden Markov models has           ing and the parcel identification, and the period 2016-2020
been demonstrated in France [? ]. However, survey data          for the S2 data.
are not always available. Relying on machine learning
techniques, [15] use a Markov Logic model in order to           Crop Type data
predict the following year’s crop in France, with an accu-   The crop type data were obtained from the Dutch Land
racy of 60%. In [16], the authors focused on deep deep       Parcel Identification System and GeoSpatial Aid Applica-
neural networks to reach a maximum accuracy of 88% on        tion, named Basis registratie Percelen (BRP). Dutch farm-
a 6-class portion of the US Cropland Data Layer (CDL)        ers must annually record their field parcel boundaries and
dataset over 12 years [17].                                  associated cultivated crops.1 The 12 yearly BRP (2009-
Motivation A lot of works are focusing on the use of 2020) were merged through geographical polygon inter-
remote sensing to predict the crop type at pixel or parcel sections. The output polygons correspond to the 12-year
level using only the EO and in-situ observations of the intersected areas and there are associated with 386 crop
current year. Nevertheless, they consider the signal as codes. The polygons which areas are lower than half
independents from a year to another. Other works are an hectare were discarded. The output product contains
using the crop rotations of the parcels in order to tackle a 974,000 polygons covering a total of 1,600 Mha.
pre-season prediction of the crop type, focusing on a few       For the evaluation, we propose 3 granularities of labels,
classes problem. In this case, it is obvious there is too using several aggregations lead by an expert from the
much information missing to reach high performances. domain and yielding to 386, 28 and 12 crop classes.
As of 2022, we identified a single study combining the           1
                                                                   https://data.overheid.nl/data/dataset/basisregistratie-
                                                                gewaspercelen-brp
Figure 1: Distributions of the crop types in the dataset. Green crops are the remaining ones for the 10-class evaluation.

Sentinel-2 Data                                                    The crop label categories for 2020, the year used as
                                                                test set, correspond to a long-tailed class distributions,
Data The study relies on the analysis of the optical
                                                                as shown for the 28-class aggregation in Figure 1.
Copernicus Sentinel-2 (S2) data. S2 constellation provides
observations with a minimum revisit of five days over ten       EO-based Features
land spectral bands of the optical domain (460-2280 nm),
with a spatial resolution of 10-20 meters depending on theWe integrate the EO time series spatially by averaging at
bands. The data are processed up to surface reflectance   the parcel-level, then temporally using a sliding window
(SR) Level 2A accounting for atmospheric corrections and  of size 30 days and a step size of 15 days. For each parcel,
cloud/cloud-shadow screening using sen2cor algorithm      this yields to 25 windows for the whole year for each
[20]. The data are available though the JEODPP platform   of the Remote Sensing (RS) signal, that we integrated
[21]. Cloud free SR data were processed to 20-m Leaf      temporally using 7 statistical functionals: mean, standard
Area Index (LAI) and Fraction of Absorbed Photosynthet-   deviation, 1st quartile, median, 3rd quartile, minimum
ically Active Radiation (FAPAR) using BV-NET [22] and     and maximum. In total we obtain 7*4=28 features per
calibration settings of [23]. For each polygon, B4 (red   window, leading to 700 features per year.
band) SR, B8A (near infrared band) SR, LAI and FAPAR         With this configuration we have overlap between the
were averaged at polygon level using pixels in a 20-m     windows and avoiding to loose information by breaking
inner buffer in order to remove parcel edge effects.      the signal dynamics, at the price of a bit of redundancy
                                                          in the features. On each window, we integrated each
Time series Smoothing Despite the cloud and cloud- signal using statistical functionals like it would be done
shadow screening of L2A S2 products, noise remains for speech data [27].
in the resulting time series [24]. We applied a time se-
ries outliers detection based on B4 (for omitted cloud) Spatial Crop Distribution
and B8A (for omitted cloud-shadow) and using the Ham- The spatial crop distribution was derived for the year
pel filer [25]. Filtered data were removed for the four 2019 (year N-1 as compared to the 2020 validation test
variables. The filtered time series of the four variables set). For each polygon, we compute the sum of the surface
were smoothed using the Whittaker algorithm [26] im- for each crops of the data base included in a 10-km circle
plemented by the World Food Program.2 Time series and turned it to percentage. This a-priori distribution of
were first resampled and interpolated to a 2-day time crops is proven to be relatively stable in time with minor
step and then the Whittaker algorithm using the V-curve change from year to year [28]. We round the probability
optimization of the smoothing parameter is applied. It at 10− 4, leading to some values being 0 when not null.
yielded to 2-day smoothed time series of each of the four
variables, from October N-1 to October N for cropping 2.3. Learning Model
season of year N.                                         This section describes the learning model and the the
                                                                features’ integration as observations.
2.2. Feature Extractions
                                                                Unimodal RNN-LSTM Crop Rotations model
Crop Types
                                                            We are modeling the crop rotation at the level of a year
The crop types labels contains 386 different types of crops
                                                            by using a LSTM that is trained like a language model.
over the 12 years of study. We model the crop by a one-
                                                            Indeed, it is possible to see each crop like a token in a
hot vector of size 𝑉 = 386 and used it as an input to an
                                                            sentence and train a recurrent neural network that will
embedding layer.
                                                            learn to predict the next word regarding the preceding
    2
      https://github.com/WFP-VAM/vam.whittaker              words.
  We firstly add an embedding layer to transform the         forward and backward hidden states. For a sequence
crop type 𝑐𝑡 at time 𝑡 into a vector (see Equation 1).       of inputs [RS𝑡1 , ..., RS𝑡𝑤 ] it outputs 𝑤 hidden states
                                                             [h𝑅𝑆𝑡1 , ..., h𝑅𝑆𝑡𝑤 ]. The attention layer will compute
                     emb𝑡 = 𝑓𝑒 (𝑐𝑡 )                   (1)   the scalar weights 𝑢𝑡𝑤 for each of the h𝑅𝑆𝑡𝑤 (see Equa-
                                                             tion 5) in order to aggregate them to obtain the final state
  Then we feed this vector into the RNN to produce a         h𝑅𝑆𝑡 (see Equation 6).
hidden state ℎ𝑡 at time 𝑡 (see Equation 2), which will be
used to predict the next crop 𝑐𝑡+1 (see Equation 3).                           𝑢𝑡𝑤 = 𝑎𝑡𝑡(h𝑅𝑆𝑡𝑤 )                     (5)
              h𝑡 = 𝐿𝑆𝑇 𝑀𝑦 (emb𝑡 |h𝑡−1 )                (2)
                                                                                     ∑︁
                                                                              h𝑅𝑆𝑡 =    𝑢𝑡𝑤 h𝑅𝑆𝑡𝑤                    (6)
                                                                                        𝑤

              𝑃 (𝑐𝑡+1 |𝑐𝑡 , ..., 𝑐1 ) = 𝑓𝑐 (h𝑡 )       (3)   Locally aggregated crop distributions
                                                            When classifying at the scale of a whole country, the
Features from RS signal
                                                            agricultural practices like the type of crops that are used
Using only the past rotations to predict the following can change. Typically the distribution of the crop types
year’s crop is very difficult, hence we chose to add avail- in a region is a stable value over the years and represent
able information from satellite data in order to make the the kind of crops supposed to be found in this part of the
model more robust.                                          world. We integrated this local information by adding a
   Firstly, we enhance the unimodal LSTM crop model vector representing the distributions over the crop types
by adding information from RS and aligned it at the year- in an area corresponding to a circle of 10-km centered
level before concatenating the unimodal RS vector with around the studied parcel.
the crop embedding. Secondly, we chose to process the          We chose to add the distribution vector before the last
RS signal beforehand using another RNN and concate- layer because it is a high-level feature regarding the task
nated this unimodal RS vector obtained with the crop we are tackling and the deeper you go into the layers the
embedding, in a Hierarchical way. Those networks are higher-level the representations are w.r.t. the task [31].
denoted with a Hier- in their name.                         We concatenated the hidden state h𝑡 of the LSTM with
                                                            the crop distribution vector d and mixed them using two
Multimodal model with RS For the first model, we fully connected layers 𝑓𝑓 𝑐1 and 𝑓𝑓 𝑐2 (see Equation 7).
integrated the RS features at the year-level before the Hence, we obtain h𝑑𝑡 instead of h𝑡 before the final fully
LSTM modeling the crop types. We feed the 700 features connected layer 𝑓𝑓 𝑐 from Equation 3. This final model is
𝑅𝑆𝑡 into a neural network layer 𝑓𝑟𝑠 to reduce their size denoted as Final.
and then concatenate them with the crop embeddings
before the LSTM (see Equation 4), using 𝑒𝑚𝑏𝑀 𝑀𝑡 instead                     h𝑑𝑡 = 𝑓𝑓 𝑐2 (𝑓𝑓 𝑐1 ([h𝑡 , d]))           (7)
of 𝑒𝑚𝑏𝑡 in Equation 2. This model denoted as LSTM𝑀 𝑀
                                                             3. Experiments and Results
             emb𝑀 𝑀𝑡 = [emb𝑡 , 𝑓𝑟𝑠 (RS𝑡 )]             (4)
                                                          In this section we will describe the different experiments
Bidirectional RNN-LSTM with attention to model and results we ran with all the different models. Because
the RS time-series The first model presented above of the nature of our predictions, it can be useful to get
does not take into account the sequentiality of the RS them before the end of the farming season. In this con-
signal. We decided to correct this aspect by processing text, we ran experiments using different setups when pre-
the RS features at the year level with a first RNN before dicting, we used an end-of-season configuration and an
adding their yearly representation into the second neural early-classification configuration. For the end-of-season
network modeling the crop types, leading to a hierarchi- configuration we feed the neural network with all the RS
cal network [29]. This will give 28 features per window data of the year while in the early-classification configu-
RS𝑡𝑤 , for a sequence length of 25 per year.              ration we stop to different date of the year. We compared
   We chose to enhance a simple LSTM with a bidirec- using LSTM processing the RS data and tagging at the
tional LSTM (biLSTM) with a self-attention mechanism year-level, seeing all the year in an independent way.
[30] following the assumption that some parts of the year This year-independent model obtained state-of-the-art
are more important than others to discriminate the crop results according to [7] and is denoted as LSTM𝑌 𝐼 .
type. This model denoted as HierbiLSTM𝑀 𝑀
                                                          3.1. Experimental protocol
   The biLSTM is composed of 2 LSTM, one of each
read the sequence forward and the other reads it back- We trained all the networks via mini-batch stochastic
ward. The final hidden states are a concatenation of the gradient descent using Adam as optimizer [32] with a
 Labels                               386-class                      28-class                      12-class                      10-class
                  #Modal.
 Model                        P      R      F1     Acc    P      R        F1     Acc    P      R        F1     Acc    P      R        F1     m-F1
 LSTM𝐶𝑟𝑜𝑝           1 (C)     28.1   22.4   23.1   73.3   45.3   33.4     34.2   76.4   53.1   44.7     43.8   77.2   46.7   39.2     37.1   52.1
 LSTM𝑌 𝐼 [7]        1 (RS)    14.3   9.8    9.9    72.5   53.1   45.1     45.9   88.5   75.0   66.3     67.7   90.4   72.1   62.2     63.7   80.3
 LSTM𝑅𝑆             1 (RS)    13.7   11.8   10.8   72.5   49.7   47.0     44.1   87.4   70.6   69.4     65.3   89.0   67.3   65.3     60.7   76.0
 HierbiLSTM𝑅𝑆       1 (RS)    10.7   10.0   9.0    78.7   48.0   48.1     44.5   88.7   72.7   71.0     67.7   90.7   69.3   66.6     63.0   79.1
 LSTM𝑀 𝑀          2 (RS+C)    32.5   26.1   26.2   86.8   63.3   57.4     56.7   91.8   79.9   78.5     78.2   93.2   77.9   75.2     75.4   85.1
 HierbiLSTM𝑀 𝑀    2 (RS+C)    42.0   33.8   35.1   88.5   68.9   62.8     63.2   93.5   84.1   80.8     81.3   94.5   82.3   77.9     78.8   87.8
 Final              3 (All)   41.0   33.3   34.3   89.7   71.4   62.7     63.2   93.8   85.5   81.2     82.6   94.8   84.1   78.3     80.2   88.7

Table 1
Results of the end-of-season classification models with different modalities (Remote Sensing, Crop Rotation, and Spatial Crop
Distribution). The metrics shown are Macro precision, recall and F1 score, as well as accuracy and micro-F1 score (m-F1).

learning rate of 10−3 and a cross-entropy loss function.                    At first glance, we can see that the model using only the
The number of neurons for the crop embedding layer,                      crop rotations can still reach an Accuracy of 73.3% for the
both the RNN internal layers, and the fully connected RS                 386-class problem even if it does not use any information
layer 𝑓𝑟𝑠 as well as the number of stacked LSTM were                     from the current year to make it’s prediction.
chosen using hyperparameters search. The sizes of the                       Our RS models reach high results on the 386-class (up
layers 𝑓𝑐1 and 𝑓𝑐2 are the same than the one from the                    to 78.7% with the HierbiLSTM𝑅𝑆 model) due to the fact
second RNN state h𝑡 .                                                    that, contrary to the main part of the works, they also use
   We trained our networks as for a sequence classifica-                 RS data from the past years on the same parcel, allowing
tion task, always with ten years of data. The labels from                to model a temporal context. Interestingly, the Hierarchi-
2018 were used as training set, while the labels from 2019               cal setup with RS only allows for reaching higher results
as development set and the labels from 2020 as test set.                 on the 386-class configuration, going from an accuracy
All results presented hereafter refer to the analysis of                 of 72.5 to 78.7, when compared to the LSTM𝑌 𝐼 .
2020 crop types, which are based on models trained with                     Finally, the local crop distribution vector allow for a
the period 2009-2019, thus independent from the 2020                     slight improvement, which is more visible in the 10-class
crop types observations. We zero-padded when no RS                       configuration. However, it unexpectedly decreases the
data was available (before 2016).                                        macro-F1 while increasing the Accuracy for the 386-class
   We proceed to a data-augmentation for the in-season                   configuration. This can be interpreted as the model mak-
classification model by cropping randomly the end of the                 ing more mistakes on non-frequent crops only because
timeseries for each batch starting from mid-March. All                   it’s globally better. An explanation can be that the non
models were coded using the PyTorch library [33].                        frequent crops are not all situated in the same area, hence
                                                                         their distribution probability density is always approxi-
3.2. Results                                                             mated as 0.
In this Section we will show the results with two different              3.2.2. Toward In-Season Classification
settings: the classical setting where the network sees the
                                                           We saw earlier that the RS signal has shown pretty good
whole year of RS signal, and a special early-season setting
                                                           end-of-season results, but it is known that the perfor-
where the RS signal of the current season stops before
                                                           mances are strongly degraded when classifying during
the end of the season. In order to deal with unbalanced
                                                           the season[11]. In this case, the crop rotations enhanced
classes, we used unweighted F1, Precision and Recall as
                                                           modality can help.
well as the Accuracy. We used also the micro-F1, which
                                                              For the in-season classification, we simply used our
is equivalent to Accuracy when having removed classes.
                                                           model trained over the whole year with data stopping at
   We also present results for 10 classes, which is the
                                                           a point of the year. In Figure 2, we compared the model
12-class settings without grassland and other crops (see
                                                           using RS signal only with the multimodal model. It is
Figure 1).
                                                           important to notice that we used the same "final" model
3.2.1. End-of-Season Classification                        to adapt our domain to this noisy setup. The missing fea-
The results of the end-of-season classification are avail- tures, corresponding to unused months, were replaced by
able in Table 1. We tested different configurations of zeros. The results are thus preliminary and it is expected
networks, using different kind of features. The best re- to obtained poor performances. A straightforward option
sults are obtained with our final model using information could be to train new models for each of the evaluated
from the crop rotations, the S2 time series and the crop months of the in-season classification.
distribution of the surrounding fields.3                      The multimodal model always outperforms the RS
                                                           model which is expected, especially at the beginning of
                                                           the season when almost no information is available using
    3
      from 2019                                            the RS modality.
Figure 2: Comparison of Early classification using different modalities, with/out data augmentation (m-F1 with 10 classes).

   Labels            12-class               10-class             It is also interesting noting that the performances go
   Thresh     P        F1     Acc    P       F1      m-F1
                                                               below the unimodal crop model. This is certainly related
   𝑛𝑜𝑛𝑒       85.3     82.6   94.8   84.1    80.2   88.7       that the models may give too much attention to the RS
   .9         93.7     88.2   98.2   93.1    86.4   95.8
                                                               modality compared to the other ones, because the RS data
Table 2                                                        modality has higher impact on the performance as the
Results with the model using only the examples with a high     season progresses. An option to counter this effect would
probability for the predicted class.                           be to use a gate that would discard a noisy modality, as
                                                               shown in [34, 35].
4. Analysis
For the sake of clarity, all analyses presented hereafter
                                                               5. Conclusion and Future works
in this section are limited to a set of crops of interest, We presented an innovative study to produce in-season
corresponding to the 10-class setting. Details on the crop mapping without relying on in-situ data of the cur-
crops are provided in Figure 1.                             rent season. The approach relies on the analysis of several
High Precision examples                                     modalities, including the crop rotation of the previous
                                                            years, the Sentinel-2 time series of previous and current
We are presenting the results of our model on a fewer year as well as the previous year local crop distributions
parcels where the precision is better than normal. In the in the neighborhood parcels. A deep learning algorithm
perspective of crop monitoring, this analysis can be very was used to model all those modalities at different level
valuable. Even if not 100% of the parcels are aggregated, using a Hierarchical LSTM model. Firstly, we modeled
the output might support crop yield forecasting system, the RS data with a Bidirectional-LSTM with Attention,
through the analysis of the crop specific RS time series using a sliding window on the satellite signals and in-
with highest probability.                                   tegrating them using statistical functionals as it can be
   We are taking the examples that are classified with done for speech. Secondly, we fed the representation into
a probability superior to 0.9 and compute some metrics another LSTM network modeling the crops as words and
over them. Those examples represent a big part of the their rotation as sentence as it can be done with a lan-
dataset, they are more than 536k for the 12-class dataset guage model. Finally, we added a context vector on the
and more than 148k for the 10-class dataset, represent- last layer in order to add information about the geograph-
ing respectively 90.0% of all the parcels and 76.5% of ical place of the parcel. The designed methodology was
the parcels containing crop of interests. The results are tested over cropland of the Netherlands, benefiting from
shown in Table 2.                                           12 years of crop rotation data nationwide. More generally,
                                                            our method outperforms by a great margin the classical
In-Season Classification
                                                            state-of-the-art using only a RNN or a Transformer to
We compare the vanilla model with the in-season classi- model the EO data at the level of a year.
fication model trained with our data-augmentation tech-        Nevertheless, there is still a lot of place for future work.
nique. The vanilla model has only seen during training More spectral bands added in the EO data could improve
examples of end-of-season classification, it is normal that the performances. A better way to model the multimodal-
they perform worst when used in in-season. This ex- ity, at the level of EO data using multimodal aligned or
plains the fact that there is a decrease in performance non-aligned time-series fusion models[36, 37], and at
compared to a model only taking into account the crops. a higher level between static representations [34]. Fi-
   The data-augmentation used for the in-season models nally our model impossible to adapt to an unknown place
surprisingly does not work with RS only model, but allow where the crop rotations are not available, a domain adap-
the multimodal model to overpass the crop-only model tation method using few-shot learning could be useful in
in April.                                                   this case [38].
References                                                    [11] M. Rußwurm, R. Tavenard, S. Lefèvre, M. Körner,
                                                                   Early Classification for Agricultural Monitoring
 [1] I. Becker-Reshef, C. Justice, M. Sullivan, E. Vermote,        from Satellite Time Series, ICML AI4socialgood
     C. Tucker, A. Anyamba, J. Small, E. Pak, E. Ma-               workshop (2019). URL: http://arxiv.org/abs/1908.
     suoka, J. Schmaltz, et al., Monitoring global crop-           10283. arXiv:1908.10283.
     lands with coarse resolution earth observations:         [12] M. Rußwurm, M. K rner, Multi-temporal land
     The global agriculture monitoring (glam) project,             cover classification with sequential recurrent en-
     Remote Sensing 2 (2010) 1589–1609.                            coders,     ISPRS International Journal of Geo-
 [2] P. Defourny, S. Bontemps, N. Bellemans, C. Cara,              Information 7 (2018). doi:10.3390/ijgi7040129.
     G. Dedieu, E. Guzzonato, O. Hagolle, J. Inglada,              arXiv:1802.02080.
     L. Nicola, T. Rabaute, et al., Near real-time agricul-   [13] M. Russwurm, M. Koerner, Temporal Vegeta-
     ture monitoring at national scale at parcel resolu-           tion Modelling Using Long Short-Term Memory
     tion: Performance assessment of the sen2-agri auto-           Networks for Crop Identification from Medium-
     mated system in various cropping systems around               Resolution Multi-spectral Satellite Images, IEEE
     the world, Remote sensing of environment 221                  Computer Society Conference on Computer Vi-
     (2019) 551–568.                                               sion and Pattern Recognition Workshops 2017-
 [3] M. Hansen, R. Dubayah, R. DeFries, Classification             July (2017) 1496–1504. doi:10.1109/CVPRW.2017.
     trees: an alternative to traditional land cover clas-         193.
     sifiers, International journal of remote sensing 17      [14] S. Dogliotti, W. Rossing, M. Van Ittersum, Rotat, a
     (1996) 1075–1081.                                             tool for systematically generating crop rotations,
 [4] J. Inglada, M. Arias, B. Tardy, O. Hagolle, S. Valero,        European Journal of Agronomy 19 (2003) 239–250.
     D. Morin, G. Dedieu, G. Sepulcre, S. Bontemps,           [15] J. Osman, J. Inglada, J. F. Dejoux, Assessment of a
     P. Defourny, et al., Assessment of an operational             Markov logic model of crop rotations for early crop
     system for crop type map production using high                mapping, Computers and Electronics in Agricul-
     temporal and spatial resolution satellite optical im-         ture 113 (2015) 234–243. doi:10.1016/j.compag.
     agery, Remote Sensing 7 (2015) 12356–12379.                   2015.02.015.
 [5] D. M. Johnson, R. Mueller, et al., The 2009 cropland     [16] R. Yaramasu, V. Bandaru, K. Pnvr, Pre-season crop
     data layer, Photogramm. Eng. Remote Sens 76 (2010)            type mapping using deep neural networks, Comput-
     1201–1205.                                                    ers and Electronics in Agriculture 176 (2020) 105664.
 [6] R. d’Andrimont, A. Verhegghen, G. Lemoine,                    URL: https://doi.org/10.1016/j.compag.2020.105664.
     P. Kempeneers, M. Meroni, M. van der Velde, From              doi:10.1016/j.compag.2020.105664.
     parcel to continental scale–a first european crop        [17] C. Boryan, Z. Yang, R. Mueller, M. Craig, Monitor-
     type map based on sentinel-1 and lucas copernicus             ing US agriculture: The US department of agricul-
     in-situ observations, Remote Sensing of Environ-              ture, national agricultural statistics service, crop-
     ment 266 (2021) 112708.                                       land data layer program, Geocarto International
 [7] M. Rußwurm, S. Lefèvre, M. Körner, Breizhcrops:               26 (2011) 341–358. doi:10.1080/10106049.2011.
     a satellite time series dataset for crop type identi-         562309.
     fication, Time Series Workshop of the 36th ICML          [18] D. M. Johnson, R. Mueller, Pre-and within-season
     (2019).                                                       crop type classification trained with archival land
 [8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,              cover information, Remote Sensing of Environment
     L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,              264 (2021) 112576.
     Attention Is All You Need, arXiv:1706.03762 [cs]         [19] T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ,
     (2017).                                                       S. Khudanpur, Recurrent neural network based
 [9] S. Hochreiter, J. Schmidhuber,                  LONG          language model, in: Eleventh annual conference
     SHORT-TERM MEMORY,                     Neural Com-            of the international speech communication associ-
     putation       9    (1997)      1735–1780.       URL:         ation, 2010.
     http://www7.informatik.tu-muenchen.de/                   [20] J. Louis, V. Debaecker, B. Pflug, M. Main-Knorn,
     {~}hochreit{%}5Cnhttp://www.idsia.ch/{~}juergen.              J. Bieniarz, U. Mueller-Wilm, E. Cadau, F. Gascon,
     doi:10.1162/neco.1997.9.8.1735.                               Sentinel-2 sen2cor: L2a processor for users, in:
     arXiv:1206.2944.                                              Proceedings Living Planet Symposium 2016, Space-
[10] M. Rußwurm, M. Körner, Self-attention for raw                 books Online, 2016, pp. 1–8.
     optical Satellite Time Series Classification, ISPRS      [21] P. Soille, A. Burger, P. Hasenohr, P. Kempe-
     Journal of Photogrammetry and Remote Sensing                  neers, D. Rodriguez Aseretto, V. Syrris, V. Vasilev,
     169 (2020) 421–435. doi:10.1016/j.isprsjprs.                  D. Marchi, The jrc earth observation data and pro-
     2020.06.006. arXiv:1910.10536.                                cessing platform, Big Data From Space, Toulouse,
     France (2017).                                                   ference on Learning Representations (2014)
[22] F. Baret, O. Hagolle, B. Geiger, P. Bicheron, B. Miras,          1–13.      URL:       http://arxiv.org/abs/1412.6980.
     M. Huc, B. Berthelot, F. Niño, M. Weiss, O. Samain,              doi:http://doi.acm.org.ezproxy.lib.
     et al., Lai, fapar and fcover cyclopes global products           ucf.edu/10.1145/1830483.1830503.
     derived from vegetation: Part 1: Principles of the               arXiv:1412.6980.
     algorithm, Remote sensing of environment 110                [33] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-
     (2007) 275–286.                                                  bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
[23] M. Claverie, E. F. Vermote, M. Weiss, F. Baret,                  L. Antiga, et al., Pytorch: An imperative style, high-
     O. Hagolle, V. Demarez, Validation of coarse spatial             performance deep learning library, Advances in
     resolution lai and fapar time series over cropland in            neural information processing systems 32 (2019)
     southwest france, Remote Sensing of Environment                  8026–8037.
     139 (2013) 216–230.                                         [34] J. Arevalo, T. Solorio, M. Montes-Y-Gómez, F. A.
[24] M. Claverie, J. Ju, J. G. Masek, J. L. Dungan, E. F. Ver-        González, Gated multimodal units for information
     mote, J.-C. Roger, S. V. Skakun, C. Justice, The har-            fusion, in: 5th International Conference on Learn-
     monized landsat and sentinel-2 surface reflectance               ing Representations, ICLR 2017 - Workshop Track
     data set, Remote Sensing of Environment 219 (2018)               Proceedings, 2017. arXiv:1702.01992.
     145–161.                                                    [35] M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis,
[25] R. K. Pearson, Outliers in process modeling and                  A. Zadeh, L.-P. Morency, Multimodal sentiment
     identification, IEEE Transactions on control sys-                analysis with word-level fusion and reinforcement
     tems technology 10 (2002) 55–63.                                 learning, in: Proceedings of the 19th ACM Interna-
[26] P. H. Eilers, V. Pesendorfer, R. Bonifacio, Automatic            tional Conference on Multimodal Interaction, 2017,
     smoothing of remote sensing data, in: 2017 9th                   pp. 163–171.
     International Workshop on the Analysis of Mul-              [36] A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cam-
     titemporal Remote Sensing Images (MultiTemp),                    bria, L.-P. Morency, Memory Fusion Network for
     IEEE, 2017, pp. 1–3.                                             Multi-view Sequential Learning, in: AAAI, 2018.
[27] B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K.        arXiv:arXiv:1802.00927v1.
     Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho,        [37] J. Yang, Y. Wang, R. Yi, Y. Zhu, A. Rehman,
     K. Evanini, The INTERSPEECH 2016 Computa-                        A. Zadeh, S. Poria, L.-p. Morency, Unaligned
     tional Paralinguistics Challenge: Deception, Sincer-             Human Multimodal Language Sequences (2020).
     ity & Native Language, in: Proceedings of the An-                arXiv:arXiv:2010.11985v1.
     nual Conference of the International Speech Com-            [38] M. Rußwurm, S. Wang, K. Marco, D. Lobell, Meta-
     munication Association, INTERSPEECH, 2016.                       Learning for Few-Shot Land Cover Classification,
[28] F. A. Merlos, R. J. Hijmans, The scale dependency                in: IEEE/CVF conference on computer vision and
     of spatial crop species diversity and its relation to            pattern recognition workshops, 2019.
     temporal diversity, Proceedings of the National
     Academy of Sciences 117 (2020) 26176–26182.
[29] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville,
     J. Pineau, Building End-To-End Dialogue Systems
     Using Generative Hierarchical Neural Network
     Models (2015). URL: http://arxiv.org/abs/1507.04808.
     doi:10.1017/CBO9781107415324.004.
     arXiv:1507.04808.
[30] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel,
     Y. Bengio,        End-to-end attention-based large
     vocabulary speech recognition, ICASSP, IEEE
     International Conference on Acoustics, Speech
     and Signal Processing - Proceedings 2016-May
     (2016) 4945–4949. doi:10.1109/ICASSP.2016.
     7472618. arXiv:1508.04395.
[31] V. Sanh, T. Wolf, S. Ruder, H. Court, H. Row, A
     Hierarchical Multi-task Approach for Learning Em-
     beddings from Semantic Tasks, in: AAAI, 2018.
     arXiv:arXiv:1811.06031v1.
[32] D. Kingma, J. Ba,           Adam: A Method for
     Stochastic Optimization,            International Con-

</pre>