Deep Sensing of Ocean Wave Heights with Synthetic Aperture Radar

                      Brandon Quach,1,2 Yannik Glaser,2 Justin Stopa,3 Peter Sadowski2∗
                           1
                               Computing and Mathematical Sciences, California Institute of Technology
                                2
                                  Information and Computer Sciences, University of Hawai‘i at Mānoa
                                  3
                                    Ocean Resources and Engineering, University of Hawai‘i at Mānoa


                           Abstract                                  imaging to vital sea state information. Such reduced rep-
                                                                     resentations of high-dimensional data can be useful when
  The Sentinel-1 satellites equipped with synthetic aperture
                                                                     fitting statistical models to relatively small data sets, but
  radars (SAR) provide near global coverage of the world’s
  oceans every six days. We curate a data set of co-locations        they are also limiting; task-relevant information is almost-
  between SAR and altimeter satellites, and investigate the use      certainly lost when reducing a high-dimensional SAR image
  of deep learning to predict significant wave height from SAR.      to the low-dimensional CWAVE feature space.
  While previous models for predicting geophysical quantities            In this work, we attempt to extract additional informa-
  from SAR rely heavily on feature-engineering, our approach         tion from SAR images using deep learning with artificial
  learns directly from low-level image cross-spectra. Training       neural networks. Deep learning has proven to be an ex-
  on co-locations from 2015-2017, we demonstrate on test data        tremely effective approach to representation learning, lead-
  from 2018 that deep learning reduces the state-of-the-art root     ing to rapid advances in diverse fields such as computer vi-
  mean squared error by 50%, from 0.6 meters to 0.3 meters.          sion (Krizhevsky, Sutskever, and Hinton 2014), high-energy
                                                                     physics (Baldi, Sadowski, and Whiteson 2014; Sadowski
                       Introduction                                  and Baldi 2018), and chemistry (Lusci, Pollastri, and Baldi
Synthetic aperture radar (SAR) is an important remote sens-          2013; Duvenaud et al. 2015). Deep learning has the potential
ing technology able to achieve high spatial resolution (< 10         to make similar advances in remote sensing for oceanogra-
meter). From SAR satellite data, geophysical properties can          phy by extracting information directly from SAR modula-
be predicted using statistical models, enabling researchers to       tion cross-spectra.
monitor global sea states with unprecedented coverage, pre-
cision, frequency, and without the use of complicated SAR
modulation transfer functions (Schulz-Stellenfleth, König,
and Lehner 2007). Sea state information provides scientific
value in understanding the propagation of waves (Collard,
Ardhuin, and Chapron 2009; Stopa et al. 2016) and the ef-
fects of climate change (Young, Zieger, and Babanin 2011),
as well as immediate practical benefits such as alerting ships
to dangerously large waves created by storms.
   SARs capture sea surface roughness and many other geo-
physical phenomena (Wang et al. 2019). Therefore, pre-
dicting ocean wave signatures from SAR images typically
requires feature engineering — a dimensionality-reduction
step that extracts task-specific i nformation. C WAVE i s a
common feature set for describing wave properties in SAR
as a basis of 20 orthogonal features derived from the SAR
modulation spectra. CWAVE has been used to estimate                  Figure 1: Spatial distribution of co-locations between
the significant w ave h eight f or t he S ARs a board: 1 ) ERS-      Sentinel-1 SAR satellites and altimeter satellites, in 2 × 2◦
2 (Schulz-Stellenfleth, K önig, a nd L ehner 2 007), 2 ) EN-        bins.
VISAT (Li, Lehner, and Bruns 2011), and Sentinel-1 (Stopa
and Mouche 2017; Pleskachevsky et al. 2019) linking SAR                In this work, we first curate a data set of over 750,000
   ∗
   peter.sadowski@hawaii.edu                                         co-locations of SAR and altimeter satellites, which provides
Copyright c 2020, for this paper by its authors. Use permitted       SAR in conjunction with direct measurements of ocean
under Creative Commons License Attribution 4.0 International         wave heights. The data is used to train deep neural networks
(CCBY 4.0).
Figure 2: Level 1 SAR image covering a square 20 × 20 km area (left); real component of the image spectra obtained by taking
the 2D Fourier transform (center); and 20 orthogonal CWAVE basis functions designed to summarize the image spectra (right).
The inputs to the deep neural network are the real and imaginary components of the image spectra, represented as two 72 × 60
matrices.


to predict significant wave height, Hs , defined as the mean        inary modulation spectra (Johnsen and Collard 2009) (Fig-
of the top third of a wave height distribution. We compare          ure 2). The modulation spectra consists of two matrices (real
training on SAR image spectra vs. high-level CWAVE fea-             and imaginary) of shape 72 × 60 with one dimension cor-
tures, and analyze the effect of training data set size.            responding to wavenumber and the other direction. These
                                                                    two matrices were then stacked to form the input tensor with
                          Methods                                   shape 72 × 60 × 2. The 1-Hz altimeter dataset estimates sig-
                                                                    nificant wave heights with spatial footprints of 6 to 10 km.
Data                                                                The altimeter dataset consists of data merged from 6 differ-
We curate a training set of historical data from two types          ent altimeter missions and has been cross-calibrated between
of polar-orbiting satellites: Sentinel-1 SAR satellites and al-     platforms, as in Ribal and Young (Ribal and Young 2019).
timeter satellites. Because the satellites are on different tra-    The SAR image spectra were then pre-processed by center-
jectories, their paths frequently intersect, providing mea-         ing and scaling the real and imaginary image modulation
surements from roughly the same location at the same time.          spectra separately — each pixel was normalized by subtract-
Specifically, the data set is constructed using measurements        ing the overall mean and dividing by the overall standard
that are less than 3 hours apart and with spatial differences       deviation of all pixels and all co-locations.
less than 300 km, resulting in 753,777 co-location events              In addition to the SAR image spectra, we include a num-
from 2015 through 2018 that are geographically well dis-            ber of high-level features in our model. First, we include the
tributed (Figure 1). These events have both SAR imaging             time and distance between the satellite co-location measure-
from Sentinel-1 and significant wave height from an altime-         ments, normalized to have zero mean and unit variance —
ter, and provide a high-fidelity reference data set (Ribal and      while this information is only available during training. The
Young 2019).                                                        time (or distance) between satellite observations provides a
   The data set is split into training, validation, and test sets   rough estimate of how much we can trust the altimeter mea-
based on year of data collection. Co-location events from           surement to provide an accurate target because sea states can
2015 and 2016 were used as the training set, events from            change faster than our original time and space constraints.
2017 was used as a validation set, and events from 2018 was         These features are simply set to zero at prediction time. Sec-
used as held-out test set. The result was 303,574 training ex-      ond, the time-of-day was encoded as a value between -1 and
amples, 265,052 validation examples, and 185,151 test ex-           1 using the function f (t) = 2sin( 2πt
                                                                                                         48 ) − 1; this normaliza-
amples. The validation set was used for learning rate anneal-       tion helps stabilize the neural network optimization. Third,
ing, early stopping, and hyper-parameter selection, while the       latitude and longitude were encoded by representing each as
test set was only used for the final evaluation of the model.       an angle in the range [0, 2π) then taking the sine and cosine,
   The Sentinel-1 SAR data set consists of the real and imag-       resulting in four features total. Fourth, a binary label was
inary components computed from SAR modulation cross                 created to specify the SAR satellite; S1-A or S1-B are cali-
spectra. Each data point within the cross spectra was cre-          brated to produce comparable data, but there could be small
ated by taking the Level 1 SAR image with 5 × 5 m pixel             differences. Finally, we also include the 20 non-dimensional
resolution covering a 20 × 20 km area and applying a 2D             CWAVE parameters that are derived from the image spectra,
Fourier transformation to different ”looks” within the dwell        each normalized using standard scaling to have zero mean
time (Engen and Johnsen 1995) to obtain the real and imag-          and unit variance over the training examples.
                                                                                                                 Output
                  Input


                          72x60x2       35x29x64       16x13x128       7x5x256   Dense   Dense
                                                                                  256     256


                                                                                                 Dense Dense
                                                                                                  256   128


                          1x32
                                                          Dense 256


Figure 3: Deep neural network architecture with two input types: SAR image spectra comprising one real and one imaginary
channel (top), and 32 scalar-valued features (bottom). The SAR images are processed by multiple 2D convolution layers before
the two branches of the network are combined by three dense layers at the output.


Deep Learning                                                         • 1 time difference between altimeter and sentinel measure-
Deep Neural Network Architecture We propose a deep                      ments feature
neural network architecture that predicts significant wave            • 1 spatial difference between altimeter and sentinel mea-
height by using the input data from SAR image spectra. The              surements feature
model starts as two branches with separate inputs: one which
processes the spectral input and another which extracts in-           • 1 normalized radar cross section σ 0 feature
formation from the high-level features (Figure 3). The spec-          • 1 normalized variance of radar cross section feature.
tral input branch takes an input tensor of the shape (72, 60, 2)
where the real and imaginary values of the Fourier transform          These 32 high-level features are fed into 11 dense layers
are stacked along the third axis, analogous to the ‘colors’ of        with 256 ReLU hidden units each. Both branches yield a flat-
an RGB image. This input tensor is then fed sequentially              tened array of size 256 which are then concatenated to form
into three convolutional layers containing 64, 128, and 256           a single vector with 512 features. Two hidden dense layers
filters respectively. A filter size of 3×3 is maintained at each      of 256 and 128 hidden ReLU units then integrate the image
convolutional layer with a rectified linear unit (ReLU) acti-         spectra branch with the second branch. Finally, an output
vation. In addition, each layer is followed by a max pooling          layer with a dropout of 0.337 and softplus activation makes
layer with a 2 × 2 window. The final convolutional layer is           the final prediction.
fed into a global max-pooling layer which produces a flat-               This model is trained to minimize the mean squared er-
tened array of size 256. This is then fed into two additional         ror (MSE) using the Adam optimizer (Kingma and Ba 2014)
dense layers with 256 hidden units each with ReLU activa-             with a batch size of 128 and an initial learning rate of 0.0003.
tion. The non-spectral data branch consists an input layer of         The learning rate was decreased by 20% if the validation
the following 32 features:                                            set MSE did not improve over 4 epochs, and training was
                                                                      stopped when the validation set MSE did not improve over
• 20 CWAVE features                                                   10 epochs. The best model was trained for 35 epochs. The
• 1 time of day feature                                               dropout rate, initial learning rate, and batch size were opti-
• 2 latitude features (sine and cosine)                               mized using the SHERPA black-box optimization package
                                                                      for machine learning hyper-parameter tuning (Hertel et al.
• 2 longitude features (sine and cosine)                              2018) on a cluster Nvidia RTX 2080 Ti GPUs. One hundred
• 1 incidence angle feature                                           models were trained using the random search algorithm to
• 1 incidence angle mode feature (binary flag representing            optimize over the search space shown in Table 1.
  WV1 or WV2)                                                         CWAVE Models To measure the advantage of the deep
• 1 satellite source feature (binary flag representing                learning approach over simpler models, we also trained two
  Sentinel-1A and Sentinel-1B)                                        models that predict the significant wave height from the 32
                                                                   were trained with an identical DNN architecture and hyper-
          Table 1: Hyper-parameter Search Space                    parameters, but with specific high-level features removed:
          Parameter          Range                                 the 20 CWAVE parameters removed or the latitude and lon-
          Batch Size         {128, 256, 512, 1024}                 gitude features. In both models, time of day, incidence an-
          Learning Rate      [0.0001, 0.001]                       gle, incidence angle mode, satellite type, time and distance
          Dropout            [0.2, 0.5]                            difference between altimeter and sentinel data, normalized
                                                                   radar cross section σ 0 and normalized variance of radar
                                                                   cross section are still included. The results show that the
                                                                   high-level CWAVE features are still used by the model, but
high-level features alone: a simple linear regression model        only slightly — despite containing no additional informa-
and a deep neural network. The neural network consisted of         tion, these features add implicit bias to the model. The lo-
eleven dense hidden layers of 256 ReLU units, followed by          cation features do contain additional information — they
a layer of 64 ReLU units, and two outputs. The two out-            essentially allow the model to learn a prior over the wave
puts correspond to a heteroskedastic Gaussian distribution         heights at different locations — but these too only have a
N (y1 , y2 ), where y2 is restricted to ensure positive variance   small effect on performance.
by defining a custom activation function:
                                            
                            y2      y2 > 0,                                     Table 3: Feature Importance Study
                  y2 =        1
                                     y ≤0
                           1−y2      2

   Weights are initialized using the scaling suggested by (He                           No
                                                                                                                  Percentage
et al. 2015), and the conditional negative log-likelihood of        Wave          No    Lati-     All
                                                                                                                  of
the target values is minimized using the Adam optimizer             Height        CWAVE tude      Included
                                                                                                                  Total Data
(Kingma and Ba 2014) with mini-batches of size 1024. The                                Longitude
initial learning rate of 0.003 decays starting at epoch 300 at      All Waves     0.334 0.329     0.327           100%
a decay rate of 0.0005 applied at the end of each subsequent        <1m           0.439 0.421     0.392           1.4%
epoch. A dropout rate of 0.5 is applied to the penultimate          1m - 3m       0.263 0.255     0.255           66.4%
layer. Training is stopped when the validation loss doesn’t         3m - 8m       0.432 0.429     0.426           31.8%
improve after 15 epochs. The architecture, learning rate, and       >8m           1.145 1.187     1.216           0.4%
early-stopping were optimized with SHERPA.

                           Results                                    Finally, we explore the impact of increasing the size of
                                                                   the training set on the discrepancy between including and
To compare the three types of models after hyper-parameter         not including CWAVE parameters in our final model. In
tuning, we trained each on data from 2015-2016, and tested         this experiment, we fix the hyper-parameters, train on data
on events from 2018, enabling us to explore the relative ben-      from 2015-2017 (568,626 examples), then test on 2018. The
efits of deep learning and the use of image spectra features.      models are trained for a fixed 30 epochs where the learn-
Table 2 shows that the deep neural network trained on image        ing rate is annealed by a factor of 0.4 every 10 epochs. Ini-
spectra achieves a significantly lower root mean squared er-       tial learning rate and dropout are identical to that of our
ror (RMSE) of 0.33 meters compared to the other methods            optimal deep neural network architecture. Figure 4 shows
that rely only on the high-level features: 0.64 m for the lin-     the mean performance of six randomly-initialized networks
ear model and 0.43 m for the deep neural network trained on        trained with different fractions of the data set. An ensem-
CWAVE alone. Furthermore, this performance improvement             ble (arithmetic mean) of the 6 models using all features and
is consistent across small, medium, and large waves.               the complete training set gives a test RMSE of 0.307 — a
                                                                   50% reduction in RMSE from the previous state-of-the-art
                                                                   of 0.6 m (Stopa and Mouche 2017). This also approaches
       Table 2: Root Mean Squared Error on Test Set                the RMSE of satellite altimetry compared to buoy observa-
                                                                   tions (Ribal and Young 2019).
                                                 Percentage
 Wave          CWAVE CWAVE Deep
                                                 of                                       Discussion
 Height        Linear NN   NN
                                                 Total Data
 All Waves     0.642       0.433         0.327   100%              Our results demonstrate that a deep convolutional neural
 <1m           0.827       0.443         0.392   1.4%              network can extract useful representations from SAR im-
 1m - 3m       0.515       0.377         0.255   66.4%             age spectra that is not captured by engineered CWAVE fea-
 3m - 8m       0.781       0.514         0.426   31.8%             tures. In a direct comparison between two hyper-parameter
 >8m           3.226       1.512         1.216   0.4%              optimized deep neural networks, the network with the im-
                                                                   age spectra information obtained a 29% reduction in RMSE
                                                                   (0.33 m vs. 0.43 m). This is in keeping with the success of
  A feature importance study (Table 2) shows the de-               deep learning in other fields, where the expertly engineered
pendence on each set of features. Two additional models            features are discarded in favor of learned features.
                                                                  mance Center (4000107360/12/I-LG) and Sentinel-1 Ocean
                                                                  Study (S1-4SCI-16-0002). The altimetry data was sourced
                                                                  from the Integrated Marine Observing System (IMOS). All
                                                                  Sentinel-1 L2 data used in this study can be obtained from
                                                                  the Copernicus Data Hub (cophub.copernicus.eu). The au-
                                                                  thors would like to thank NVIDIA for a hardware grant
                                                                  to PS. The technical support and advanced computing re-
                                                                  sources from the University of Hawai‘i Information Tech-
                                                                  nology Services Cyberinfrastructure are gratefully acknowl-
                                                                  edged.

                                                                                         References
                                                                  Baldi, P.; Sadowski, P.; and Whiteson, D. 2014. Searching
                                                                  for exotic particles in high-energy physics with deep learn-
                                                                  ing. Nature Communications 5.
Figure 4: Performance of with both SAR image spectra and          Collard, F.; Ardhuin, F.; and Chapron, B. 2009. Monitoring
CWAVE parameters (blue line) and only SAR image spec-             and analysis of ocean swell fields from space: New methods
tra (orange curve), while varying the training set size. Each     for routine observations. Journal of Geophysical Research
point is an average of MSE objective over 6 trials with dif-      114(C7).
ferent random weight initializations.                             Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell,
                                                                  R.; Hirzel, T.; Aspuru-Guzik, A.; and Adams, R. P. 2015.
                                                                  Convolutional networks on graphs for learning molecular
   Our results show that there is still some advantage to in-     fingerprints. In Advances in Neural Information Processing
cluding the CWAVE features in the model, even with a train-       Systems, 2215–2223.
ing set of over 500,000 examples. However, we also show
that this advantage diminishes as the number of training ex-      Engen, G., and Johnsen, H. 1995. SAR-ocean wave in-
amples increases (Figure 4). The CWAVE features, being            version using image cross spectra. IEEE Transactions on
derived from the image data, provide no additional infor-         Geoscience and Remote Sensing 33(4):1047–1056.
mation, but they help bias the model towards a good solu-         He, K.; Zhang, X.; Ren, S.; and Sun, J. 2015. Delving deep
tion. As the training data set increases, the model is slower     into rectifiers: Surpassing human-level performance on im-
to overfit, and the benefit of including the CWAVE features       agenet classification. In Proceedings of the IEEE Interna-
disappears.                                                       tional Conference on Computer Vision, 1026–1034.
   Latitude and longitude information is useful for predict-
                                                                  Hertel, L.; Collado, J.; Sadowski, P.; and Baldi, P. 2018.
ing significant wave heights because there are regional char-
                                                                  Sherpa: Hyperparameter optimization for machine learning
acteristics of the wave climate (Stopa et al. 2013). How-
                                                                  models.
ever, the feature importance study shows that our model only
makes minimal use of this information. This is encouraging,       Johnsen, H., and Collard, F. 2009. Sentinel-1 ocean swell
because it implies that the model is relying almost entirely      wave spectra (osw) algorithm definition. Technical report,
on the direct measurements rather than geographical infor-        Northern Research Institute.
mation.                                                           Kingma, D. P., and Ba, J. 2014. Adam: A method for
                                                                  stochastic optimization. arXiv preprint arXiv:1412.6980.
                       Conclusion                                 Krizhevsky, A.; Sutskever, I.; and Hinton, G. 2014. Ima-
Our results demonstrate that deep learning provides a 50%         genet classification with deep convolutional neural. In Neu-
decrease in RMSE compared to the previous state-of-the-           ral Information Processing Systems, 1–9.
art in predicting significant wave height from SAR. Instead
of relying on the set of engineered CWAVE features that           Li, X.-M.; Lehner, S.; and Bruns, T. 2011. Ocean wave inte-
capture most of the discriminative information, our deep          gral parameter measurements using envisat asar wave mode
learning approach learns directly from the low-level, high-       data. IEEE Transactions on Geoscience and Remote Sensing
dimensional image spectra. Furthermore, our results indicate      49(1):155–174.
that there is still room for improvement with additional train-   Lusci, A.; Pollastri, G.; and Baldi, P. 2013. Deep architec-
ing data. Thus, we should we should expect the performance        tures and deep learning in chemoinformatics: the prediction
of our model to increase as more co-location events are col-      of aqueous solubility for drug-like molecules. Journal of
lected over the next couple years.                                Chemical Information and Modeling 53(7):1563–1575.
                                                                  Pleskachevsky, A.; Jacobsen, S.; Tings, B.; and Schwarz, E.
                   Acknowledgments                                2019. Estimation of sea state from sentinel-1 synthetic aper-
This work was made possible thanks to SAR data access             ture radar imagery for maritime situation awareness. Inter-
granted by ESA projects: Sentinel-1 A Mission Perfor-             national Journal of Remote Sensing 40(11):4104–4142.
Ribal, A., and Young, I. R. 2019. 33 years of globally cal-
ibrated wave height and wind speed data based on altimeter
observations. Scientific Data 6(1).
Sadowski, P., and Baldi, P. 2018. Deep learning in the natu-
ral sciences: applications to physics. In Braverman Readings
in Machine Learning. Key Ideas from Inception to Current
State. Springer. 269–297.
Schulz-Stellenfleth, J.; König, T.; and Lehner, S. 2007. An
empirical approach for the retrieval of integral ocean wave
parameters from synthetic aperture radar data. Journal of
Geophysical Research 112(C3).
Stopa, J. E., and Mouche, A. 2017. Significant wave heights
from sentinel-1 sar: Validation and applications. Journal of
Geophysical Research: Oceans 122(3):1827–1848.
Stopa, J. E.; Cheung, K. F.; Tolman, H. L.; and Chawla, A.
2013. Patterns and cycles in the climate forecast system re-
analysis wind and wave data. Ocean Modelling 70:207–220.
Stopa, J. E.; Ardhuin, F.; Husson, R.; Jiang, H.; Chapron,
B.; and Collard, F. 2016. Swell dissipation from 10 years
of envisat advanced synthetic aperture radar in wave mode.
Geophysical Research Letters 43(7):3423–3430.
Wang, C.; Mouche, A.; Tandeo, P.; Stopa, E. J.; Longepe,
N.; Erhard, G.; Foster, R.; Vandemark, D.; and Chapron,
B. 2019. A labeled Ocean SAR Imagery Dataset of Ten
Geophysical Phenomena from Sentinel-1 Wave Mode. Geo-
science Data Journal.
Young, I. R.; Zieger, S.; and Babanin, A. V. 2011.
Global trends in wind speed and wave height. Science
332(6028):451–455.