=Paper=
{{Paper
|id=Vol-3207/paper8
|storemode=property
|title=Multimodal Crop Type Classification Fusing Multi-Spectral Satellite Time Series with Farmers Crop Rotations and Local Crop Distribution
|pdfUrl=https://ceur-ws.org/Vol-3207/paper8.pdf
|volume=Vol-3207
|authors=Valentin Barriere,Martin Claverie
|dblpUrl=https://dblp.org/rec/conf/cdceo/BarriereC22
}}
==Multimodal Crop Type Classification Fusing Multi-Spectral Satellite Time Series with Farmers Crop Rotations and Local Crop Distribution==
Multimodal Crop Type Classification Fusing Multi-Spectral Satellite Time Series with Farmers Crop Rotations and Local Crop Distribution Valentin Barriere1 , Martin Claverie1 1 European Commission’s Joint Research Center, Via Fermi, 2749, 21027 Ispra VA, Italy Abstract Accurate, detailed, and timely crop type mapping is a very valuable information for the institutions in order to create more accurate policies according to the needs of the citizens. In the last decade, the amount of available data dramatically increased, whether it can come from Remote Sensing (using Copernicus Sentinel-2 data) or directly from the farmers (providing in-situ crop information throughout the years and information on crop rotation). Nevertheless, the majority of the studies are restricted to the use of one modality (Remote Sensing data or crop rotation) and never fuse the Earth Observation data with domain knowledge like crop rotations. Moreover, when they use Earth Observation data they are mainly restrained to one year of data, not taking into account the past years. In this context, we propose to tackle a land use and crop type classification task using three data types, by using a Hierarchical Deep Learning algorithm modeling the crop rotations like a language model, the satellite signals like a speech signal and using the crop distribution as additional context vector. We obtained very promising results compared to classical approaches with significant performances, increasing the Accuracy by 5.1 points in a 28-class setting (.948), and the micro-F1 by 9.6 points in a 10-class setting (.887) using only a set of crop of interests selected by an expert. We finally proposed a data-augmentation technique to allow the model to classify the crop before the end of the season, which works surprisingly well in a multimodal setting. Keywords Remote Sensing, Farmer’s Rotations, Multimodal System, Hierarchical Model 1. Introduction Such approaches, as the one proposed in this study, are Timely and accurate crop type mapping provides valuable based on model trained with past seasons and applied on information for crop monitoring and productions forecast the current one, plus we proposed a data-augmentation [1]. In-season crop type mapping can serve not only method to obtain satisfying results earlier in the season. to better estimate the crop areas, but also to improve Earth Observation-based crop type mapping Ma- the yield forecasting by using crop-type specific models. chine learning classification methods have been widely Crop type mapping is thus a major information of the tested to derive crop type map from remote sensing data. crop monitoring systems focusing to in-season forecast Among the various methods, Random Forest algorithm of the crop production. has proved its capacity to accurately identify crop type, The high-spatial resolution time series enables to deter- accounting for large and non parametric data set [3]. mine crop type at a sub-parcel level in most agricultural Since 2015 and the launch of the first satellite of the areas. Most of the remote sensing classification systems Copernicus Sentinel-2 (S2) constellation, the perspective relies on supervised techniques, requiring in-situ crop for crop type mapping at large scale has changed. The identification survey. If the survey data are provided high spatial and temporal resolution of S2 offers indeed within the season, some systems [2] are designed to pre- an appropriate data set to distinguish crop type, based on dict crop type along the season with a given uncertainty, the spectral and temporal signals, at parcel or sub-parcel even if the crop cycle is on-going; such surveys data are level in most agricultural region. Taking benefit of this expensive because of the need of labels from the cur- capacity, some operational systems have been expended rent year to train a model, difficult to achieve at large [4, 2, 5], combining Earth Observation (EO) data, in situ scale and in most cases delivered after the cropping sea- observations and classifier algorithm to deliver crop type son. There is a high demand for crop type mapping that maps at regional, country scale or continental scale [6]. does not rely on survey data from the on-going season. Crop type mapping using Deep-Learning method CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth The recent progresses in deep-learning benefit the crop Observation, July 25, 2022, Vienna, Austria $ valentin.barriere@ec.europa.eu (V. Barriere); type mapping applications. In [7], the authors are classi- martin.claverie@ec.europa.eu (M. Claverie) fying crop types at the parcel-level, using the data from 0000-0002-0877-7063 (V. Barriere); 0000-0002-0877-7063 the French Brittany during the season 2017. The authors (M. Claverie) have compared a Transformer-Encoder [8] and a Recur- © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). rent Neural Network of type Long-Short-Term-Memory CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (LSTM) [9]. They obtain comparable results between use of crop rotations and satellite time-series data over the Transformers and the LSTM, obtaining best accuracy several years: [18]. They present a methodology to de- (0.69) for the former and macro-F1 (0.59) for the latter. rive near real time Cropland Data Layer over major US In [10], the authors have designed a crop classifier agricultural states. The methodology is nonetheless re- at the parcel-level using S2 and compared several ap- stricted to a limited number of crop types and the use proaches to model the signal, comprising a Transformer of Random Forest classifier, while the recent progress in and a LSTM. They obtain respective overall accuracies deep learning shows tremendous improvements in such between 0.85 and 0.92 using the LSTM depending on the data mining problem. number of classes considered. A similar approach has been run by [11] on 40k Central Europa parcels using Contributions We propose to model both the crop ro- S2. They proposed a new early classification mechanism tations and the S2 time series signal in a multimodal way in order to enhance a classical model with an additional using a hierarchical Long-short-term-memory (LSTM). stopping probability based on the previously seen infor- The contribution is unique in term of conception as no mation. work has been proposed fusing the large amount of tem- Finally, [12] are using the same technique developed in porally fine-grained EO data with crop rotation analysis [13], where they tackle the task of crop classification at in an advanced deep learning method. The crop rotations the pixel level, i.e. accounting for the spatial variation to and the S2 time series were enhanced by the use of the detect parcels boundaries. They are using a CNN-LSTM crop distributions of the neighborhood fields picked from network on S2 images to classify 17 types of crops. previous year. The crop rotations are modeled over the year as words would be in a language model [19], helped Modeling the crop rotation sequences Crop rota- by the S2 time-series data that are modeled as if it was the tion is a widely-used agronomic technique for sustainable prosody of the speaker. Finally, the high-level features farming, preserving the long term soil quality. Good un- we add on the last layer of the network could be seen as derstanding and design of crop rotation are essentials for the distribution of the words used by our speaker. Finally, sustainability and to mitigate the variability of agricul- we also propose a data-augmentation technique for the tural productivity induced by climate change. The crop in-season classification, by randomly cropping the end of rotation depends on the farmer management decision, the RS time-series data. It allows to learn a model able to but some good practices are shared, enabling to model the classify the type of crop without the whole time-series, crop rotation patterns [14]. They remains nonetheless hence before the end of the season. complex and non stable in time; changes may be related to, e.g. economic consideration (commodities price) or 2. Methodology administrative regulation (e.g. subsidies changes). Expert knowledge based models are thus very limited and rarely 2.1. Dataset accurate over large areas and long periods. Alternatively, The study is focused on data acquired over The Nether- estimation of the crop sequence probabilities without a lands, covers the period 2009-2020 for the crop type label- priori using survey data and hidden Markov models has ing and the parcel identification, and the period 2016-2020 been demonstrated in France [? ]. However, survey data for the S2 data. are not always available. Relying on machine learning techniques, [15] use a Markov Logic model in order to Crop Type data predict the following year’s crop in France, with an accu- The crop type data were obtained from the Dutch Land racy of 60%. In [16], the authors focused on deep deep Parcel Identification System and GeoSpatial Aid Applica- neural networks to reach a maximum accuracy of 88% on tion, named Basis registratie Percelen (BRP). Dutch farm- a 6-class portion of the US Cropland Data Layer (CDL) ers must annually record their field parcel boundaries and dataset over 12 years [17]. associated cultivated crops.1 The 12 yearly BRP (2009- Motivation A lot of works are focusing on the use of 2020) were merged through geographical polygon inter- remote sensing to predict the crop type at pixel or parcel sections. The output polygons correspond to the 12-year level using only the EO and in-situ observations of the intersected areas and there are associated with 386 crop current year. Nevertheless, they consider the signal as codes. The polygons which areas are lower than half independents from a year to another. Other works are an hectare were discarded. The output product contains using the crop rotations of the parcels in order to tackle a 974,000 polygons covering a total of 1,600 Mha. pre-season prediction of the crop type, focusing on a few For the evaluation, we propose 3 granularities of labels, classes problem. In this case, it is obvious there is too using several aggregations lead by an expert from the much information missing to reach high performances. domain and yielding to 386, 28 and 12 crop classes. As of 2022, we identified a single study combining the 1 https://data.overheid.nl/data/dataset/basisregistratie- gewaspercelen-brp Figure 1: Distributions of the crop types in the dataset. Green crops are the remaining ones for the 10-class evaluation. Sentinel-2 Data The crop label categories for 2020, the year used as test set, correspond to a long-tailed class distributions, Data The study relies on the analysis of the optical as shown for the 28-class aggregation in Figure 1. Copernicus Sentinel-2 (S2) data. S2 constellation provides observations with a minimum revisit of five days over ten EO-based Features land spectral bands of the optical domain (460-2280 nm), with a spatial resolution of 10-20 meters depending on theWe integrate the EO time series spatially by averaging at bands. The data are processed up to surface reflectance the parcel-level, then temporally using a sliding window (SR) Level 2A accounting for atmospheric corrections and of size 30 days and a step size of 15 days. For each parcel, cloud/cloud-shadow screening using sen2cor algorithm this yields to 25 windows for the whole year for each [20]. The data are available though the JEODPP platform of the Remote Sensing (RS) signal, that we integrated [21]. Cloud free SR data were processed to 20-m Leaf temporally using 7 statistical functionals: mean, standard Area Index (LAI) and Fraction of Absorbed Photosynthet- deviation, 1st quartile, median, 3rd quartile, minimum ically Active Radiation (FAPAR) using BV-NET [22] and and maximum. In total we obtain 7*4=28 features per calibration settings of [23]. For each polygon, B4 (red window, leading to 700 features per year. band) SR, B8A (near infrared band) SR, LAI and FAPAR With this configuration we have overlap between the were averaged at polygon level using pixels in a 20-m windows and avoiding to loose information by breaking inner buffer in order to remove parcel edge effects. the signal dynamics, at the price of a bit of redundancy in the features. On each window, we integrated each Time series Smoothing Despite the cloud and cloud- signal using statistical functionals like it would be done shadow screening of L2A S2 products, noise remains for speech data [27]. in the resulting time series [24]. We applied a time se- ries outliers detection based on B4 (for omitted cloud) Spatial Crop Distribution and B8A (for omitted cloud-shadow) and using the Ham- The spatial crop distribution was derived for the year pel filer [25]. Filtered data were removed for the four 2019 (year N-1 as compared to the 2020 validation test variables. The filtered time series of the four variables set). For each polygon, we compute the sum of the surface were smoothed using the Whittaker algorithm [26] im- for each crops of the data base included in a 10-km circle plemented by the World Food Program.2 Time series and turned it to percentage. This a-priori distribution of were first resampled and interpolated to a 2-day time crops is proven to be relatively stable in time with minor step and then the Whittaker algorithm using the V-curve change from year to year [28]. We round the probability optimization of the smoothing parameter is applied. It at 10− 4, leading to some values being 0 when not null. yielded to 2-day smoothed time series of each of the four variables, from October N-1 to October N for cropping 2.3. Learning Model season of year N. This section describes the learning model and the the features’ integration as observations. 2.2. Feature Extractions Unimodal RNN-LSTM Crop Rotations model Crop Types We are modeling the crop rotation at the level of a year The crop types labels contains 386 different types of crops by using a LSTM that is trained like a language model. over the 12 years of study. We model the crop by a one- Indeed, it is possible to see each crop like a token in a hot vector of size 𝑉 = 386 and used it as an input to an sentence and train a recurrent neural network that will embedding layer. learn to predict the next word regarding the preceding 2 https://github.com/WFP-VAM/vam.whittaker words. We firstly add an embedding layer to transform the forward and backward hidden states. For a sequence crop type 𝑐𝑡 at time 𝑡 into a vector (see Equation 1). of inputs [RS𝑡1 , ..., RS𝑡𝑤 ] it outputs 𝑤 hidden states [h𝑅𝑆𝑡1 , ..., h𝑅𝑆𝑡𝑤 ]. The attention layer will compute emb𝑡 = 𝑓𝑒 (𝑐𝑡 ) (1) the scalar weights 𝑢𝑡𝑤 for each of the h𝑅𝑆𝑡𝑤 (see Equa- tion 5) in order to aggregate them to obtain the final state Then we feed this vector into the RNN to produce a h𝑅𝑆𝑡 (see Equation 6). hidden state ℎ𝑡 at time 𝑡 (see Equation 2), which will be used to predict the next crop 𝑐𝑡+1 (see Equation 3). 𝑢𝑡𝑤 = 𝑎𝑡𝑡(h𝑅𝑆𝑡𝑤 ) (5) h𝑡 = 𝐿𝑆𝑇 𝑀𝑦 (emb𝑡 |h𝑡−1 ) (2) ∑︁ h𝑅𝑆𝑡 = 𝑢𝑡𝑤 h𝑅𝑆𝑡𝑤 (6) 𝑤 𝑃 (𝑐𝑡+1 |𝑐𝑡 , ..., 𝑐1 ) = 𝑓𝑐 (h𝑡 ) (3) Locally aggregated crop distributions When classifying at the scale of a whole country, the Features from RS signal agricultural practices like the type of crops that are used Using only the past rotations to predict the following can change. Typically the distribution of the crop types year’s crop is very difficult, hence we chose to add avail- in a region is a stable value over the years and represent able information from satellite data in order to make the the kind of crops supposed to be found in this part of the model more robust. world. We integrated this local information by adding a Firstly, we enhance the unimodal LSTM crop model vector representing the distributions over the crop types by adding information from RS and aligned it at the year- in an area corresponding to a circle of 10-km centered level before concatenating the unimodal RS vector with around the studied parcel. the crop embedding. Secondly, we chose to process the We chose to add the distribution vector before the last RS signal beforehand using another RNN and concate- layer because it is a high-level feature regarding the task nated this unimodal RS vector obtained with the crop we are tackling and the deeper you go into the layers the embedding, in a Hierarchical way. Those networks are higher-level the representations are w.r.t. the task [31]. denoted with a Hier- in their name. We concatenated the hidden state h𝑡 of the LSTM with the crop distribution vector d and mixed them using two Multimodal model with RS For the first model, we fully connected layers 𝑓𝑓 𝑐1 and 𝑓𝑓 𝑐2 (see Equation 7). integrated the RS features at the year-level before the Hence, we obtain h𝑑𝑡 instead of h𝑡 before the final fully LSTM modeling the crop types. We feed the 700 features connected layer 𝑓𝑓 𝑐 from Equation 3. This final model is 𝑅𝑆𝑡 into a neural network layer 𝑓𝑟𝑠 to reduce their size denoted as Final. and then concatenate them with the crop embeddings before the LSTM (see Equation 4), using 𝑒𝑚𝑏𝑀 𝑀𝑡 instead h𝑑𝑡 = 𝑓𝑓 𝑐2 (𝑓𝑓 𝑐1 ([h𝑡 , d])) (7) of 𝑒𝑚𝑏𝑡 in Equation 2. This model denoted as LSTM𝑀 𝑀 3. Experiments and Results emb𝑀 𝑀𝑡 = [emb𝑡 , 𝑓𝑟𝑠 (RS𝑡 )] (4) In this section we will describe the different experiments Bidirectional RNN-LSTM with attention to model and results we ran with all the different models. Because the RS time-series The first model presented above of the nature of our predictions, it can be useful to get does not take into account the sequentiality of the RS them before the end of the farming season. In this con- signal. We decided to correct this aspect by processing text, we ran experiments using different setups when pre- the RS features at the year level with a first RNN before dicting, we used an end-of-season configuration and an adding their yearly representation into the second neural early-classification configuration. For the end-of-season network modeling the crop types, leading to a hierarchi- configuration we feed the neural network with all the RS cal network [29]. This will give 28 features per window data of the year while in the early-classification configu- RS𝑡𝑤 , for a sequence length of 25 per year. ration we stop to different date of the year. We compared We chose to enhance a simple LSTM with a bidirec- using LSTM processing the RS data and tagging at the tional LSTM (biLSTM) with a self-attention mechanism year-level, seeing all the year in an independent way. [30] following the assumption that some parts of the year This year-independent model obtained state-of-the-art are more important than others to discriminate the crop results according to [7] and is denoted as LSTM𝑌 𝐼 . type. This model denoted as HierbiLSTM𝑀 𝑀 3.1. Experimental protocol The biLSTM is composed of 2 LSTM, one of each read the sequence forward and the other reads it back- We trained all the networks via mini-batch stochastic ward. The final hidden states are a concatenation of the gradient descent using Adam as optimizer [32] with a Labels 386-class 28-class 12-class 10-class #Modal. Model P R F1 Acc P R F1 Acc P R F1 Acc P R F1 m-F1 LSTM𝐶𝑟𝑜𝑝 1 (C) 28.1 22.4 23.1 73.3 45.3 33.4 34.2 76.4 53.1 44.7 43.8 77.2 46.7 39.2 37.1 52.1 LSTM𝑌 𝐼 [7] 1 (RS) 14.3 9.8 9.9 72.5 53.1 45.1 45.9 88.5 75.0 66.3 67.7 90.4 72.1 62.2 63.7 80.3 LSTM𝑅𝑆 1 (RS) 13.7 11.8 10.8 72.5 49.7 47.0 44.1 87.4 70.6 69.4 65.3 89.0 67.3 65.3 60.7 76.0 HierbiLSTM𝑅𝑆 1 (RS) 10.7 10.0 9.0 78.7 48.0 48.1 44.5 88.7 72.7 71.0 67.7 90.7 69.3 66.6 63.0 79.1 LSTM𝑀 𝑀 2 (RS+C) 32.5 26.1 26.2 86.8 63.3 57.4 56.7 91.8 79.9 78.5 78.2 93.2 77.9 75.2 75.4 85.1 HierbiLSTM𝑀 𝑀 2 (RS+C) 42.0 33.8 35.1 88.5 68.9 62.8 63.2 93.5 84.1 80.8 81.3 94.5 82.3 77.9 78.8 87.8 Final 3 (All) 41.0 33.3 34.3 89.7 71.4 62.7 63.2 93.8 85.5 81.2 82.6 94.8 84.1 78.3 80.2 88.7 Table 1 Results of the end-of-season classification models with different modalities (Remote Sensing, Crop Rotation, and Spatial Crop Distribution). The metrics shown are Macro precision, recall and F1 score, as well as accuracy and micro-F1 score (m-F1). learning rate of 10−3 and a cross-entropy loss function. At first glance, we can see that the model using only the The number of neurons for the crop embedding layer, crop rotations can still reach an Accuracy of 73.3% for the both the RNN internal layers, and the fully connected RS 386-class problem even if it does not use any information layer 𝑓𝑟𝑠 as well as the number of stacked LSTM were from the current year to make it’s prediction. chosen using hyperparameters search. The sizes of the Our RS models reach high results on the 386-class (up layers 𝑓𝑐1 and 𝑓𝑐2 are the same than the one from the to 78.7% with the HierbiLSTM𝑅𝑆 model) due to the fact second RNN state h𝑡 . that, contrary to the main part of the works, they also use We trained our networks as for a sequence classifica- RS data from the past years on the same parcel, allowing tion task, always with ten years of data. The labels from to model a temporal context. Interestingly, the Hierarchi- 2018 were used as training set, while the labels from 2019 cal setup with RS only allows for reaching higher results as development set and the labels from 2020 as test set. on the 386-class configuration, going from an accuracy All results presented hereafter refer to the analysis of of 72.5 to 78.7, when compared to the LSTM𝑌 𝐼 . 2020 crop types, which are based on models trained with Finally, the local crop distribution vector allow for a the period 2009-2019, thus independent from the 2020 slight improvement, which is more visible in the 10-class crop types observations. We zero-padded when no RS configuration. However, it unexpectedly decreases the data was available (before 2016). macro-F1 while increasing the Accuracy for the 386-class We proceed to a data-augmentation for the in-season configuration. This can be interpreted as the model mak- classification model by cropping randomly the end of the ing more mistakes on non-frequent crops only because timeseries for each batch starting from mid-March. All it’s globally better. An explanation can be that the non models were coded using the PyTorch library [33]. frequent crops are not all situated in the same area, hence their distribution probability density is always approxi- 3.2. Results mated as 0. In this Section we will show the results with two different 3.2.2. Toward In-Season Classification settings: the classical setting where the network sees the We saw earlier that the RS signal has shown pretty good whole year of RS signal, and a special early-season setting end-of-season results, but it is known that the perfor- where the RS signal of the current season stops before mances are strongly degraded when classifying during the end of the season. In order to deal with unbalanced the season[11]. In this case, the crop rotations enhanced classes, we used unweighted F1, Precision and Recall as modality can help. well as the Accuracy. We used also the micro-F1, which For the in-season classification, we simply used our is equivalent to Accuracy when having removed classes. model trained over the whole year with data stopping at We also present results for 10 classes, which is the a point of the year. In Figure 2, we compared the model 12-class settings without grassland and other crops (see using RS signal only with the multimodal model. It is Figure 1). important to notice that we used the same "final" model 3.2.1. End-of-Season Classification to adapt our domain to this noisy setup. The missing fea- The results of the end-of-season classification are avail- tures, corresponding to unused months, were replaced by able in Table 1. We tested different configurations of zeros. The results are thus preliminary and it is expected networks, using different kind of features. The best re- to obtained poor performances. A straightforward option sults are obtained with our final model using information could be to train new models for each of the evaluated from the crop rotations, the S2 time series and the crop months of the in-season classification. distribution of the surrounding fields.3 The multimodal model always outperforms the RS model which is expected, especially at the beginning of the season when almost no information is available using 3 from 2019 the RS modality. Figure 2: Comparison of Early classification using different modalities, with/out data augmentation (m-F1 with 10 classes). Labels 12-class 10-class It is also interesting noting that the performances go Thresh P F1 Acc P F1 m-F1 below the unimodal crop model. This is certainly related 𝑛𝑜𝑛𝑒 85.3 82.6 94.8 84.1 80.2 88.7 that the models may give too much attention to the RS .9 93.7 88.2 98.2 93.1 86.4 95.8 modality compared to the other ones, because the RS data Table 2 modality has higher impact on the performance as the Results with the model using only the examples with a high season progresses. An option to counter this effect would probability for the predicted class. be to use a gate that would discard a noisy modality, as shown in [34, 35]. 4. Analysis For the sake of clarity, all analyses presented hereafter 5. Conclusion and Future works in this section are limited to a set of crops of interest, We presented an innovative study to produce in-season corresponding to the 10-class setting. Details on the crop mapping without relying on in-situ data of the cur- crops are provided in Figure 1. rent season. The approach relies on the analysis of several High Precision examples modalities, including the crop rotation of the previous years, the Sentinel-2 time series of previous and current We are presenting the results of our model on a fewer year as well as the previous year local crop distributions parcels where the precision is better than normal. In the in the neighborhood parcels. A deep learning algorithm perspective of crop monitoring, this analysis can be very was used to model all those modalities at different level valuable. Even if not 100% of the parcels are aggregated, using a Hierarchical LSTM model. Firstly, we modeled the output might support crop yield forecasting system, the RS data with a Bidirectional-LSTM with Attention, through the analysis of the crop specific RS time series using a sliding window on the satellite signals and in- with highest probability. tegrating them using statistical functionals as it can be We are taking the examples that are classified with done for speech. Secondly, we fed the representation into a probability superior to 0.9 and compute some metrics another LSTM network modeling the crops as words and over them. Those examples represent a big part of the their rotation as sentence as it can be done with a lan- dataset, they are more than 536k for the 12-class dataset guage model. Finally, we added a context vector on the and more than 148k for the 10-class dataset, represent- last layer in order to add information about the geograph- ing respectively 90.0% of all the parcels and 76.5% of ical place of the parcel. The designed methodology was the parcels containing crop of interests. The results are tested over cropland of the Netherlands, benefiting from shown in Table 2. 12 years of crop rotation data nationwide. More generally, our method outperforms by a great margin the classical In-Season Classification state-of-the-art using only a RNN or a Transformer to We compare the vanilla model with the in-season classi- model the EO data at the level of a year. fication model trained with our data-augmentation tech- Nevertheless, there is still a lot of place for future work. nique. The vanilla model has only seen during training More spectral bands added in the EO data could improve examples of end-of-season classification, it is normal that the performances. A better way to model the multimodal- they perform worst when used in in-season. This ex- ity, at the level of EO data using multimodal aligned or plains the fact that there is a decrease in performance non-aligned time-series fusion models[36, 37], and at compared to a model only taking into account the crops. a higher level between static representations [34]. Fi- The data-augmentation used for the in-season models nally our model impossible to adapt to an unknown place surprisingly does not work with RS only model, but allow where the crop rotations are not available, a domain adap- the multimodal model to overpass the crop-only model tation method using few-shot learning could be useful in in April. this case [38]. References [11] M. Rußwurm, R. Tavenard, S. Lefèvre, M. Körner, Early Classification for Agricultural Monitoring [1] I. Becker-Reshef, C. Justice, M. Sullivan, E. Vermote, from Satellite Time Series, ICML AI4socialgood C. Tucker, A. Anyamba, J. Small, E. Pak, E. Ma- workshop (2019). URL: http://arxiv.org/abs/1908. suoka, J. Schmaltz, et al., Monitoring global crop- 10283. arXiv:1908.10283. lands with coarse resolution earth observations: [12] M. Rußwurm, M. K rner, Multi-temporal land The global agriculture monitoring (glam) project, cover classification with sequential recurrent en- Remote Sensing 2 (2010) 1589–1609. coders, ISPRS International Journal of Geo- [2] P. Defourny, S. Bontemps, N. Bellemans, C. Cara, Information 7 (2018). doi:10.3390/ijgi7040129. G. Dedieu, E. Guzzonato, O. Hagolle, J. Inglada, arXiv:1802.02080. L. Nicola, T. Rabaute, et al., Near real-time agricul- [13] M. Russwurm, M. Koerner, Temporal Vegeta- ture monitoring at national scale at parcel resolu- tion Modelling Using Long Short-Term Memory tion: Performance assessment of the sen2-agri auto- Networks for Crop Identification from Medium- mated system in various cropping systems around Resolution Multi-spectral Satellite Images, IEEE the world, Remote sensing of environment 221 Computer Society Conference on Computer Vi- (2019) 551–568. sion and Pattern Recognition Workshops 2017- [3] M. Hansen, R. Dubayah, R. DeFries, Classification July (2017) 1496–1504. doi:10.1109/CVPRW.2017. trees: an alternative to traditional land cover clas- 193. sifiers, International journal of remote sensing 17 [14] S. Dogliotti, W. Rossing, M. Van Ittersum, Rotat, a (1996) 1075–1081. tool for systematically generating crop rotations, [4] J. Inglada, M. Arias, B. Tardy, O. Hagolle, S. Valero, European Journal of Agronomy 19 (2003) 239–250. D. Morin, G. Dedieu, G. Sepulcre, S. Bontemps, [15] J. Osman, J. Inglada, J. F. Dejoux, Assessment of a P. Defourny, et al., Assessment of an operational Markov logic model of crop rotations for early crop system for crop type map production using high mapping, Computers and Electronics in Agricul- temporal and spatial resolution satellite optical im- ture 113 (2015) 234–243. doi:10.1016/j.compag. agery, Remote Sensing 7 (2015) 12356–12379. 2015.02.015. [5] D. M. Johnson, R. Mueller, et al., The 2009 cropland [16] R. Yaramasu, V. Bandaru, K. Pnvr, Pre-season crop data layer, Photogramm. Eng. Remote Sens 76 (2010) type mapping using deep neural networks, Comput- 1201–1205. ers and Electronics in Agriculture 176 (2020) 105664. [6] R. d’Andrimont, A. Verhegghen, G. Lemoine, URL: https://doi.org/10.1016/j.compag.2020.105664. P. Kempeneers, M. Meroni, M. van der Velde, From doi:10.1016/j.compag.2020.105664. parcel to continental scale–a first european crop [17] C. Boryan, Z. Yang, R. Mueller, M. Craig, Monitor- type map based on sentinel-1 and lucas copernicus ing US agriculture: The US department of agricul- in-situ observations, Remote Sensing of Environ- ture, national agricultural statistics service, crop- ment 266 (2021) 112708. land data layer program, Geocarto International [7] M. Rußwurm, S. Lefèvre, M. Körner, Breizhcrops: 26 (2011) 341–358. doi:10.1080/10106049.2011. a satellite time series dataset for crop type identi- 562309. fication, Time Series Workshop of the 36th ICML [18] D. M. Johnson, R. Mueller, Pre-and within-season (2019). crop type classification trained with archival land [8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, cover information, Remote Sensing of Environment L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, 264 (2021) 112576. Attention Is All You Need, arXiv:1706.03762 [cs] [19] T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, (2017). S. Khudanpur, Recurrent neural network based [9] S. Hochreiter, J. Schmidhuber, LONG language model, in: Eleventh annual conference SHORT-TERM MEMORY, Neural Com- of the international speech communication associ- putation 9 (1997) 1735–1780. URL: ation, 2010. http://www7.informatik.tu-muenchen.de/ [20] J. Louis, V. Debaecker, B. Pflug, M. Main-Knorn, {~}hochreit{%}5Cnhttp://www.idsia.ch/{~}juergen. J. Bieniarz, U. Mueller-Wilm, E. Cadau, F. Gascon, doi:10.1162/neco.1997.9.8.1735. Sentinel-2 sen2cor: L2a processor for users, in: arXiv:1206.2944. Proceedings Living Planet Symposium 2016, Space- [10] M. Rußwurm, M. Körner, Self-attention for raw books Online, 2016, pp. 1–8. optical Satellite Time Series Classification, ISPRS [21] P. Soille, A. Burger, P. Hasenohr, P. Kempe- Journal of Photogrammetry and Remote Sensing neers, D. Rodriguez Aseretto, V. Syrris, V. Vasilev, 169 (2020) 421–435. doi:10.1016/j.isprsjprs. D. Marchi, The jrc earth observation data and pro- 2020.06.006. arXiv:1910.10536. cessing platform, Big Data From Space, Toulouse, France (2017). ference on Learning Representations (2014) [22] F. Baret, O. Hagolle, B. Geiger, P. Bicheron, B. Miras, 1–13. URL: http://arxiv.org/abs/1412.6980. M. Huc, B. Berthelot, F. Niño, M. Weiss, O. Samain, doi:http://doi.acm.org.ezproxy.lib. et al., Lai, fapar and fcover cyclopes global products ucf.edu/10.1145/1830483.1830503. derived from vegetation: Part 1: Principles of the arXiv:1412.6980. algorithm, Remote sensing of environment 110 [33] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad- (2007) 275–286. bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, [23] M. Claverie, E. F. Vermote, M. Weiss, F. Baret, L. Antiga, et al., Pytorch: An imperative style, high- O. Hagolle, V. Demarez, Validation of coarse spatial performance deep learning library, Advances in resolution lai and fapar time series over cropland in neural information processing systems 32 (2019) southwest france, Remote Sensing of Environment 8026–8037. 139 (2013) 216–230. [34] J. Arevalo, T. Solorio, M. Montes-Y-Gómez, F. A. [24] M. Claverie, J. Ju, J. G. Masek, J. L. Dungan, E. F. Ver- González, Gated multimodal units for information mote, J.-C. Roger, S. V. Skakun, C. Justice, The har- fusion, in: 5th International Conference on Learn- monized landsat and sentinel-2 surface reflectance ing Representations, ICLR 2017 - Workshop Track data set, Remote Sensing of Environment 219 (2018) Proceedings, 2017. arXiv:1702.01992. 145–161. [35] M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis, [25] R. K. Pearson, Outliers in process modeling and A. Zadeh, L.-P. Morency, Multimodal sentiment identification, IEEE Transactions on control sys- analysis with word-level fusion and reinforcement tems technology 10 (2002) 55–63. learning, in: Proceedings of the 19th ACM Interna- [26] P. H. Eilers, V. Pesendorfer, R. Bonifacio, Automatic tional Conference on Multimodal Interaction, 2017, smoothing of remote sensing data, in: 2017 9th pp. 163–171. International Workshop on the Analysis of Mul- [36] A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cam- titemporal Remote Sensing Images (MultiTemp), bria, L.-P. Morency, Memory Fusion Network for IEEE, 2017, pp. 1–3. Multi-view Sequential Learning, in: AAAI, 2018. [27] B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. arXiv:arXiv:1802.00927v1. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, [37] J. Yang, Y. Wang, R. Yi, Y. Zhu, A. Rehman, K. Evanini, The INTERSPEECH 2016 Computa- A. Zadeh, S. Poria, L.-p. Morency, Unaligned tional Paralinguistics Challenge: Deception, Sincer- Human Multimodal Language Sequences (2020). ity & Native Language, in: Proceedings of the An- arXiv:arXiv:2010.11985v1. nual Conference of the International Speech Com- [38] M. Rußwurm, S. Wang, K. Marco, D. Lobell, Meta- munication Association, INTERSPEECH, 2016. Learning for Few-Shot Land Cover Classification, [28] F. A. Merlos, R. J. Hijmans, The scale dependency in: IEEE/CVF conference on computer vision and of spatial crop species diversity and its relation to pattern recognition workshops, 2019. temporal diversity, Proceedings of the National Academy of Sciences 117 (2020) 26176–26182. [29] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, J. Pineau, Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (2015). URL: http://arxiv.org/abs/1507.04808. doi:10.1017/CBO9781107415324.004. arXiv:1507.04808. [30] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, Y. Bengio, End-to-end attention-based large vocabulary speech recognition, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2016-May (2016) 4945–4949. doi:10.1109/ICASSP.2016. 7472618. arXiv:1508.04395. [31] V. Sanh, T. Wolf, S. Ruder, H. Court, H. Row, A Hierarchical Multi-task Approach for Learning Em- beddings from Semantic Tasks, in: AAAI, 2018. arXiv:arXiv:1811.06031v1. [32] D. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, International Con-