<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Complex Data Challenges in Earth
Observation, July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multimodal Crop Type Classification Fusing Multi-Spectral Satellite Time Series with Farmers Crop Rotations and Local Crop Distribution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valentin Barriere</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Claverie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Commission's Joint Research Center</institution>
          ,
          <addr-line>Via Fermi, 2749, 21027 Ispra VA</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>25</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Accurate, detailed, and timely crop type mapping is a very valuable information for the institutions in order to create more accurate policies according to the needs of the citizens. In the last decade, the amount of available data dramatically increased, whether it can come from Remote Sensing (using Copernicus Sentinel-2 data) or directly from the farmers (providing in-situ crop information throughout the years and information on crop rotation). Nevertheless, the majority of the studies are restricted to the use of one modality (Remote Sensing data or crop rotation) and never fuse the Earth Observation data with domain knowledge like crop rotations. Moreover, when they use Earth Observation data they are mainly restrained to one year of data, not taking into account the past years. In this context, we propose to tackle a land use and crop type classification task using three data types, by using a Hierarchical Deep Learning algorithm modeling the crop rotations like a language model, the satellite signals like a speech signal and using the crop distribution as additional context vector. We obtained very promising results compared to classical approaches with significant performances, increasing the Accuracy by 5.1 points in a 28-class setting (.948), and the micro-F1 by 9.6 points in a 10-class setting (.887) using only a set of crop of interests selected by an expert. We finally proposed a data-augmentation technique to allow the model to classify the crop before the end of the season, which works surprisingly well in a multimodal setting.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Remote Sensing</kwd>
        <kwd>Farmer's Rotations</kwd>
        <kwd>Multimodal System</kwd>
        <kwd>Hierarchical Model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Timely and accurate crop type mapping provides valuable</title>
        <p>information for crop monitoring and productions forecast
[1]. In-season crop type mapping can serve not only
to better estimate the crop areas, but also to improve
the yield forecasting by using crop-type specific models. Earth Observation-based crop type mapping
MaCrop type mapping is thus a major information of the chine learning classification methods have been widely
crop monitoring systems focusing to in-season forecast tested to derive crop type map from remote sensing data.
of the crop production. Among the various methods, Random Forest algorithm</p>
        <p>The high-spatial resolution time series enables to deter- has proved its capacity to accurately identify crop type,
mine crop type at a sub-parcel level in most agricultural accounting for large and non parametric data set [3].
areas. Most of the remote sensing classification systems Since 2015 and the launch of the first satellite of the
relies on supervised techniques, requiring in-situ crop Copernicus Sentinel-2 (S2) constellation, the perspective
identification survey. If the survey data are provided for crop type mapping at large scale has changed. The
within the season, some systems [2] are designed to pre- high spatial and temporal resolution of S2 ofers indeed
dict crop type along the season with a given uncertainty, an appropriate data set to distinguish crop type, based on
even if the crop cycle is on-going; such surveys data are the spectral and temporal signals, at parcel or sub-parcel
expensive because of the need of labels from the cur- level in most agricultural region. Taking benefit of this
rent year to train a model, dificult to achieve at large capacity, some operational systems have been expended
scale and in most cases delivered after the cropping sea- [4, 2, 5], combining Earth Observation (EO) data, in situ
son. There is a high demand for crop type mapping that observations and classifier algorithm to deliver crop type
does not rely on survey data from the on-going season. maps at regional, country scale or continental scale [6].</p>
      </sec>
      <sec id="sec-1-2">
        <title>Such approaches, as the one proposed in this study, are based on model trained with past seasons and applied on the current one, plus we proposed a data-augmentation method to obtain satisfying results earlier in the season.</title>
        <sec id="sec-1-2-1">
          <title>Crop type mapping using Deep-Learning method</title>
          <p>The recent progresses in deep-learning benefit the crop
type mapping applications. In [7], the authors are
classifying crop types at the parcel-level, using the data from
the French Brittany during the season 2017. The authors
have compared a Transformer-Encoder [8] and a
Recurrent Neural Network of type Long-Short-Term-Memory
(LSTM) [9]. They obtain comparable results between use of crop rotations and satellite time-series data over
the Transformers and the LSTM, obtaining best accuracy several years: [18]. They present a methodology to
de(0.69) for the former and macro-F1 (0.59) for the latter. rive near real time Cropland Data Layer over major US</p>
          <p>In [10], the authors have designed a crop classifier agricultural states. The methodology is nonetheless
reat the parcel-level using S2 and compared several ap- stricted to a limited number of crop types and the use
proaches to model the signal, comprising a Transformer of Random Forest classifier, while the recent progress in
and a LSTM. They obtain respective overall accuracies deep learning shows tremendous improvements in such
between 0.85 and 0.92 using the LSTM depending on the data mining problem.
number of classes considered. A similar approach has
been run by [11] on 40k Central Europa parcels using
S2. They proposed a new early classification mechanism
in order to enhance a classical model with an additional
stopping probability based on the previously seen
information.</p>
          <p>Finally, [12] are using the same technique developed in
[13], where they tackle the task of crop classification at
the pixel level, i.e. accounting for the spatial variation to
detect parcels boundaries. They are using a CNN-LSTM
network on S2 images to classify 17 types of crops.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>Contributions We propose to model both the crop ro</title>
        <p>tations and the S2 time series signal in a multimodal way
using a hierarchical Long-short-term-memory (LSTM).</p>
        <p>The contribution is unique in term of conception as no
work has been proposed fusing the large amount of
temporally fine-grained EO data with crop rotation analysis
in an advanced deep learning method. The crop rotations
and the S2 time series were enhanced by the use of the
crop distributions of the neighborhood fields picked from
previous year. The crop rotations are modeled over the
year as words would be in a language model [19], helped
Modeling the crop rotation sequences Crop rota- by the S2 time-series data that are modeled as if it was the
tion is a widely-used agronomic technique for sustainable prosody of the speaker. Finally, the high-level features
farming, preserving the long term soil quality. Good un- we add on the last layer of the network could be seen as
derstanding and design of crop rotation are essentials for the distribution of the words used by our speaker. Finally,
sustainability and to mitigate the variability of agricul- we also propose a data-augmentation technique for the
tural productivity induced by climate change. The crop in-season classification, by randomly cropping the end of
rotation depends on the farmer management decision, the RS time-series data. It allows to learn a model able to
but some good practices are shared, enabling to model the classify the type of crop without the whole time-series,
crop rotation patterns [14]. They remains nonetheless hence before the end of the season.
complex and non stable in time; changes may be related
to, e.g. economic consideration (commodities price) or 2. Methodology
administrative regulation (e.g. subsidies changes). Expert
knowledge based models are thus very limited and rarely 2.1. Dataset
accurate over large areas and long periods. Alternatively, The study is focused on data acquired over The
Netherestimation of the crop sequence probabilities without a lands, covers the period 2009-2020 for the crop type
labelpriori using survey data and hidden Markov models has ing and the parcel identification, and the period 2016-2020
been demonstrated in France [? ]. However, survey data for the S2 data.
are not always available. Relying on machine learning
techniques, [15] use a Markov Logic model in order to Crop Type data
predict the following year’s crop in France, with an
accuracy of 60%. In [16], the authors focused on deep deep
neural networks to reach a maximum accuracy of 88% on
a 6-class portion of the US Cropland Data Layer (CDL)
dataset over 12 years [17].</p>
      </sec>
      <sec id="sec-1-4">
        <title>The crop type data were obtained from the Dutch Land</title>
        <p>Parcel Identification System and GeoSpatial Aid
Application, named Basis registratie Percelen (BRP). Dutch
farmers must annually record their field parcel boundaries and
associated cultivated crops.1 The 12 yearly BRP
(2009Motivation A lot of works are focusing on the use of 2020) were merged through geographical polygon
interremote sensing to predict the crop type at pixel or parcel sections. The output polygons correspond to the 12-year
level using only the EO and in-situ observations of the intersected areas and there are associated with 386 crop
current year. Nevertheless, they consider the signal as codes. The polygons which areas are lower than half
independents from a year to another. Other works are an hectare were discarded. The output product contains
using the crop rotations of the parcels in order to tackle a 974,000 polygons covering a total of 1,600 Mha.
pre-season prediction of the crop type, focusing on a few For the evaluation, we propose 3 granularities of labels,
classes problem. In this case, it is obvious there is too using several aggregations lead by an expert from the
much information missing to reach high performances. domain and yielding to 386, 28 and 12 crop classes.
As of 2022, we identified a single study combining the</p>
      </sec>
      <sec id="sec-1-5">
        <title>1https://data.overheid.nl/data/dataset/basisregistratie</title>
        <p>gewaspercelen-brp</p>
      </sec>
      <sec id="sec-1-6">
        <title>Data The study relies on the analysis of the optical</title>
        <p>Copernicus Sentinel-2 (S2) data. S2 constellation provides
observations with a minimum revisit of five days over ten
land spectral bands of the optical domain (460-2280 nm),
with a spatial resolution of 10-20 meters depending on the
bands. The data are processed up to surface reflectance
(SR) Level 2A accounting for atmospheric corrections and
cloud/cloud-shadow screening using sen2cor algorithm
[20]. The data are available though the JEODPP platform
[21]. Cloud free SR data were processed to 20-m Leaf
Area Index (LAI) and Fraction of Absorbed
Photosynthetically Active Radiation (FAPAR) using BV-NET [22] and
calibration settings of [23]. For each polygon, B4 (red
band) SR, B8A (near infrared band) SR, LAI and FAPAR
were averaged at polygon level using pixels in a 20-m
inner bufer in order to remove parcel edge efects.</p>
        <p>We integrate the EO time series spatially by averaging at
the parcel-level, then temporally using a sliding window
of size 30 days and a step size of 15 days. For each parcel,
this yields to 25 windows for the whole year for each
of the Remote Sensing (RS) signal, that we integrated
temporally using 7 statistical functionals: mean, standard
deviation, 1st quartile, median, 3rd quartile, minimum
and maximum. In total we obtain 7*4=28 features per
window, leading to 700 features per year.</p>
        <p>With this configuration we have overlap between the
windows and avoiding to loose information by breaking
the signal dynamics, at the price of a bit of redundancy
in the features. On each window, we integrated each
Time series Smoothing Despite the cloud and cloud- signal using statistical functionals like it would be done
shadow screening of L2A S2 products, noise remains for speech data [27].
in the resulting time series [24]. We applied a time
series outliers detection based on B4 (for omitted cloud) Spatial Crop Distribution
and B8A (for omitted cloud-shadow) and using the Ham- The spatial crop distribution was derived for the year
pel filer [ 25]. Filtered data were removed for the four 2019 (year N-1 as compared to the 2020 validation test
variables. The filtered time series of the four variables set). For each polygon, we compute the sum of the surface
were smoothed using the Whittaker algorithm [26] im- for each crops of the data base included in a 10-km circle
plemented by the World Food Program.2 Time series and turned it to percentage. This a-priori distribution of
were first resampled and interpolated to a 2-day time crops is proven to be relatively stable in time with minor
step and then the Whittaker algorithm using the V-curve change from year to year [28]. We round the probability
optimization of the smoothing parameter is applied. It at 10− 4, leading to some values being 0 when not null.
yielded to 2-day smoothed time series of each of the four
variables, from October N-1 to October N for cropping 2.3. Learning Model
season of year N.
2.2. Feature Extractions</p>
        <sec id="sec-1-6-1">
          <title>Crop Types</title>
        </sec>
      </sec>
      <sec id="sec-1-7">
        <title>The crop types labels contains 386 diferent types of crops</title>
        <p>over the 12 years of study. We model the crop by a
onehot vector of size  = 386 and used it as an input to an
embedding layer.</p>
      </sec>
      <sec id="sec-1-8">
        <title>2https://github.com/WFP-VAM/vam.whittaker</title>
      </sec>
      <sec id="sec-1-9">
        <title>This section describes the learning model and the the features’ integration as observations.</title>
        <sec id="sec-1-9-1">
          <title>Unimodal RNN-LSTM Crop Rotations model</title>
        </sec>
      </sec>
      <sec id="sec-1-10">
        <title>We are modeling the crop rotation at the level of a year</title>
        <p>by using a LSTM that is trained like a language model.
Indeed, it is possible to see each crop like a token in a
sentence and train a recurrent neural network that will
learn to predict the next word regarding the preceding
words.
forward and backward hidden states. For a sequence
of inputs [RS1 , ..., RS ] it outputs  hidden states
[h1 , ..., h ]. The attention layer will compute
(1) the scalar weights  for each of the h (see
Equation 5) in order to aggregate them to obtain the final state
h (see Equation 6).</p>
        <p>= (h )
h = ∑︁  h

(5)
(6)</p>
        <sec id="sec-1-10-1">
          <title>Locally aggregated crop distributions</title>
        </sec>
      </sec>
      <sec id="sec-1-11">
        <title>We firstly add an embedding layer to transform the</title>
        <p>crop type  at time  into a vector (see Equation 1).</p>
        <p>emb = ()</p>
        <p>Then we feed this vector into the RNN to produce a
hidden state ℎ at time  (see Equation 2), which will be
used to predict the next crop +1 (see Equation 3).
h =  (emb|h− 1)
 (+1|, ..., 1) = (h)
(2)
(3)</p>
        <sec id="sec-1-11-1">
          <title>Multimodal model with RS For the first model, we</title>
          <p>integrated the RS features at the year-level before the
LSTM modeling the crop types. We feed the 700 features
 into a neural network layer  to reduce their size
and then concatenate them with the crop embeddings
before the LSTM (see Equation 4), using  instead
of  in Equation 2. This model denoted as LSTM
Features from RS signal When classifying at the scale of a whole country, the
agricultural practices like the type of crops that are used
Using only the past rotations to predict the following can change. Typically the distribution of the crop types
year’s crop is very dificult, hence we chose to add avail- in a region is a stable value over the years and represent
able information from satellite data in order to make the the kind of crops supposed to be found in this part of the
model more robust. world. We integrated this local information by adding a</p>
          <p>Firstly, we enhance the unimodal LSTM crop model vector representing the distributions over the crop types
by adding information from RS and aligned it at the year- in an area corresponding to a circle of 10-km centered
level before concatenating the unimodal RS vector with around the studied parcel.
the crop embedding. Secondly, we chose to process the We chose to add the distribution vector before the last
RS signal beforehand using another RNN and concate- layer because it is a high-level feature regarding the task
nated this unimodal RS vector obtained with the crop we are tackling and the deeper you go into the layers the
embedding, in a Hierarchical way. Those networks are higher-level the representations are w.r.t. the task [31].
denoted with a Hier- in their name. We concatenated the hidden state h of the LSTM with
the crop distribution vector d and mixed them using two
fully connected layers 1 and 2 (see Equation 7).</p>
          <p>Hence, we obtain h instead of h before the final fully
connected layer  from Equation 3. This final model is
denoted as Final.</p>
          <p>h = 2(1([h, d]))
(7)
emb = [emb, (RS)]</p>
          <p>(4)
Bidirectional RNN-LSTM with attention to model
the RS time-series The first model presented above
does not take into account the sequentiality of the RS
signal. We decided to correct this aspect by processing
the RS features at the year level with a first RNN before
adding their yearly representation into the second neural
network modeling the crop types, leading to a
hierarchical network [29]. This will give 28 features per window
RS , for a sequence length of 25 per year.</p>
          <p>We chose to enhance a simple LSTM with a
bidirectional LSTM (biLSTM) with a self-attention mechanism
[30] following the assumption that some parts of the year
are more important than others to discriminate the crop
type. This model denoted as HierbiLSTM</p>
          <p>The biLSTM is composed of 2 LSTM, one of each
read the sequence forward and the other reads it
backward. The final hidden states are a concatenation of the</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Experiments and Results</title>
      <p>In this section we will describe the diferent experiments
and results we ran with all the diferent models. Because
of the nature of our predictions, it can be useful to get
them before the end of the farming season. In this
context, we ran experiments using diferent setups when
predicting, we used an end-of-season configuration and an
early-classification configuration. For the end-of-season
configuration we feed the neural network with all the RS
data of the year while in the early-classification
configuration we stop to diferent date of the year. We compared
using LSTM processing the RS data and tagging at the
year-level, seeing all the year in an independent way.
This year-independent model obtained state-of-the-art
results according to [7] and is denoted as LSTM  .
3.1. Experimental protocol</p>
      <sec id="sec-2-1">
        <title>We trained all the networks via mini-batch stochastic gradient descent using Adam as optimizer [32] with a</title>
        <p>learning rate of 10− 3 and a cross-entropy loss function. At first glance, we can see that the model using only the
The number of neurons for the crop embedding layer, crop rotations can still reach an Accuracy of 73.3% for the
both the RNN internal layers, and the fully connected RS 386-class problem even if it does not use any information
layer  as well as the number of stacked LSTM were from the current year to make it’s prediction.
chosen using hyperparameters search. The sizes of the Our RS models reach high results on the 386-class (up
layers 1 and 2 are the same than the one from the to 78.7% with the HierbiLSTM model) due to the fact
second RNN state h. that, contrary to the main part of the works, they also use</p>
        <p>We trained our networks as for a sequence classifica- RS data from the past years on the same parcel, allowing
tion task, always with ten years of data. The labels from to model a temporal context. Interestingly, the
Hierarchi2018 were used as training set, while the labels from 2019 cal setup with RS only allows for reaching higher results
as development set and the labels from 2020 as test set. on the 386-class configuration, going from an accuracy
All results presented hereafter refer to the analysis of of 72.5 to 78.7, when compared to the LSTM  .
2020 crop types, which are based on models trained with Finally, the local crop distribution vector allow for a
the period 2009-2019, thus independent from the 2020 slight improvement, which is more visible in the 10-class
crop types observations. We zero-padded when no RS configuration. However, it unexpectedly decreases the
data was available (before 2016). macro-F1 while increasing the Accuracy for the 386-class</p>
        <p>We proceed to a data-augmentation for the in-season configuration. This can be interpreted as the model
makclassification model by cropping randomly the end of the ing more mistakes on non-frequent crops only because
timeseries for each batch starting from mid-March. All it’s globally better. An explanation can be that the non
models were coded using the PyTorch library [33]. frequent crops are not all situated in the same area, hence
their distribution probability density is always
approxi</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3.2. Results mated as 0.</title>
      <p>In this Section we will show the results with two diferent
settings: the classical setting where the network sees the
whole year of RS signal, and a special early-season setting
where the RS signal of the current season stops before
the end of the season. In order to deal with unbalanced
classes, we used unweighted F1, Precision and Recall as
well as the Accuracy. We used also the micro-F1, which
is equivalent to Accuracy when having removed classes.</p>
      <p>We also present results for 10 classes, which is the
12-class settings without grassland and other crops (see
Figure 1).</p>
      <sec id="sec-3-1">
        <title>3.2.1. End-of-Season Classification</title>
        <p>The results of the end-of-season classification are
available in Table 1. We tested diferent configurations of
networks, using diferent kind of features. The best
results are obtained with our final model using information
from the crop rotations, the S2 time series and the crop
distribution of the surrounding fields. 3</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2.2. Toward In-Season Classification</title>
        <sec id="sec-3-2-1">
          <title>We saw earlier that the RS signal has shown pretty good</title>
          <p>end-of-season results, but it is known that the
performances are strongly degraded when classifying during
the season[11]. In this case, the crop rotations enhanced
modality can help.</p>
          <p>For the in-season classification, we simply used our
model trained over the whole year with data stopping at
a point of the year. In Figure 2, we compared the model
using RS signal only with the multimodal model. It is
important to notice that we used the same "final" model
to adapt our domain to this noisy setup. The missing
features, corresponding to unused months, were replaced by
zeros. The results are thus preliminary and it is expected
to obtained poor performances. A straightforward option
could be to train new models for each of the evaluated
months of the in-season classification.</p>
          <p>The multimodal model always outperforms the RS
model which is expected, especially at the beginning of
the season when almost no information is available using
the RS modality.</p>
          <p>Labels
Thresh

.9</p>
          <p>P</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Analysis</title>
      <p>It is also interesting noting that the performances go
below the unimodal crop model. This is certainly related
that the models may give too much attention to the RS
modality compared to the other ones, because the RS data
modality has higher impact on the performance as the
season progresses. An option to counter this efect would
be to use a gate that would discard a noisy modality, as
shown in [34, 35].</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future works</title>
      <sec id="sec-5-1">
        <title>For the sake of clarity, all analyses presented hereafter in this section are limited to a set of crops of interest, corresponding to the 10-class setting. Details on the crops are provided in Figure 1.</title>
      </sec>
      <sec id="sec-5-2">
        <title>We presented an innovative study to produce in-season</title>
        <p>crop mapping without relying on in-situ data of the
current season. The approach relies on the analysis of several
High Precision examples modalities, including the crop rotation of the previous
years, the Sentinel-2 time series of previous and current
We are presenting the results of our model on a fewer year as well as the previous year local crop distributions
parcels where the precision is better than normal. In the in the neighborhood parcels. A deep learning algorithm
perspective of crop monitoring, this analysis can be very was used to model all those modalities at diferent level
valuable. Even if not 100% of the parcels are aggregated, using a Hierarchical LSTM model. Firstly, we modeled
the output might support crop yield forecasting system, the RS data with a Bidirectional-LSTM with Attention,
through the analysis of the crop specific RS time series using a sliding window on the satellite signals and
inwith highest probability. tegrating them using statistical functionals as it can be</p>
        <p>We are taking the examples that are classified with done for speech. Secondly, we fed the representation into
a probability superior to 0.9 and compute some metrics another LSTM network modeling the crops as words and
over them. Those examples represent a big part of the their rotation as sentence as it can be done with a
landataset, they are more than 536k for the 12-class dataset guage model. Finally, we added a context vector on the
and more than 148k for the 10-class dataset, represent- last layer in order to add information about the
geographing respectively 90.0% of all the parcels and 76.5% of ical place of the parcel. The designed methodology was
the parcels containing crop of interests. The results are tested over cropland of the Netherlands, benefiting from
shown in Table 2. 12 years of crop rotation data nationwide. More generally,
In-Season Classification our method outperforms by a great margin the classical
state-of-the-art using only a RNN or a Transformer to
We compare the vanilla model with the in-season classi- model the EO data at the level of a year.
ifcation model trained with our data-augmentation tech- Nevertheless, there is still a lot of place for future work.
nique. The vanilla model has only seen during training More spectral bands added in the EO data could improve
examples of end-of-season classification, it is normal that the performances. A better way to model the
multimodalthey perform worst when used in in-season. This ex- ity, at the level of EO data using multimodal aligned or
plains the fact that there is a decrease in performance non-aligned time-series fusion models[36, 37], and at
compared to a model only taking into account the crops. a higher level between static representations [34].
Fi</p>
        <p>The data-augmentation used for the in-season models nally our model impossible to adapt to an unknown place
surprisingly does not work with RS only model, but allow where the crop rotations are not available, a domain
adapthe multimodal model to overpass the crop-only model tation method using few-shot learning could be useful in
in April. this case [38].
France (2017). ference on Learning Representations (2014)
[22] F. Baret, O. Hagolle, B. Geiger, P. Bicheron, B. Miras, 1–13. URL: http://arxiv.org/abs/1412.6980.</p>
        <p>M. Huc, B. Berthelot, F. Niño, M. Weiss, O. Samain, doi:http://doi.acm.org.ezproxy.lib.
et al., Lai, fapar and fcover cyclopes global products ucf.edu/10.1145/1830483.1830503.
derived from vegetation: Part 1: Principles of the arXiv:1412.6980.
algorithm, Remote sensing of environment 110 [33] A. Paszke, S. Gross, F. Massa, A. Lerer, J.
Brad(2007) 275–286. bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
[23] M. Claverie, E. F. Vermote, M. Weiss, F. Baret, L. Antiga, et al., Pytorch: An imperative style,
highO. Hagolle, V. Demarez, Validation of coarse spatial performance deep learning library, Advances in
resolution lai and fapar time series over cropland in neural information processing systems 32 (2019)
southwest france, Remote Sensing of Environment 8026–8037.</p>
        <p>139 (2013) 216–230. [34] J. Arevalo, T. Solorio, M. Montes-Y-Gómez, F. A.
[24] M. Claverie, J. Ju, J. G. Masek, J. L. Dungan, E. F. Ver- González, Gated multimodal units for information
mote, J.-C. Roger, S. V. Skakun, C. Justice, The har- fusion, in: 5th International Conference on
Learnmonized landsat and sentinel-2 surface reflectance ing Representations, ICLR 2017 - Workshop Track
data set, Remote Sensing of Environment 219 (2018) Proceedings, 2017. arXiv:1702.01992.
145–161. [35] M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis,
[25] R. K. Pearson, Outliers in process modeling and A. Zadeh, L.-P. Morency, Multimodal sentiment
identification, IEEE Transactions on control sys- analysis with word-level fusion and reinforcement
tems technology 10 (2002) 55–63. learning, in: Proceedings of the 19th ACM
Interna[26] P. H. Eilers, V. Pesendorfer, R. Bonifacio, Automatic tional Conference on Multimodal Interaction, 2017,
smoothing of remote sensing data, in: 2017 9th pp. 163–171.</p>
        <p>International Workshop on the Analysis of Mul- [36] A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E.
Camtitemporal Remote Sensing Images (MultiTemp), bria, L.-P. Morency, Memory Fusion Network for
IEEE, 2017, pp. 1–3. Multi-view Sequential Learning, in: AAAI, 2018.
[27] B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. arXiv:arXiv:1802.00927v1.</p>
        <p>Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, [37] J. Yang, Y. Wang, R. Yi, Y. Zhu, A. Rehman,
K. Evanini, The INTERSPEECH 2016 Computa- A. Zadeh, S. Poria, L.-p. Morency, Unaligned
tional Paralinguistics Challenge: Deception, Sincer- Human Multimodal Language Sequences (2020).
ity &amp; Native Language, in: Proceedings of the An- arXiv:arXiv:2010.11985v1.
nual Conference of the International Speech Com- [38] M. Rußwurm, S. Wang, K. Marco, D. Lobell,
Metamunication Association, INTERSPEECH, 2016. Learning for Few-Shot Land Cover Classification,
[28] F. A. Merlos, R. J. Hijmans, The scale dependency in: IEEE/CVF conference on computer vision and
of spatial crop species diversity and its relation to pattern recognition workshops, 2019.
temporal diversity, Proceedings of the National</p>
        <p>Academy of Sciences 117 (2020) 26176–26182.
[29] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville,</p>
        <p>J. Pineau, Building End-To-End Dialogue Systems
Using Generative Hierarchical Neural Network
Models (2015). URL: http://arxiv.org/abs/1507.04808.
doi:10.1017/CBO9781107415324.004.</p>
        <p>arXiv:1507.04808.
[30] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel,</p>
        <p>Y. Bengio, End-to-end attention-based large
vocabulary speech recognition, ICASSP, IEEE
International Conference on Acoustics, Speech
and Signal Processing - Proceedings 2016-May
(2016) 4945–4949. doi:10.1109/ICASSP.2016.</p>
        <p>7472618. arXiv:1508.04395.
[31] V. Sanh, T. Wolf, S. Ruder, H. Court, H. Row, A</p>
        <p>Hierarchical Multi-task Approach for Learning
Embeddings from Semantic Tasks, in: AAAI, 2018.</p>
        <p>arXiv:arXiv:1811.06031v1.
[32] D. Kingma, J. Ba, Adam: A Method for</p>
        <p>Stochastic Optimization, International
Con</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>