Deep Sensing of Ocean Wave Heights with Synthetic Aperture Radar Brandon Quach,1,2 Yannik Glaser,2 Justin Stopa,3 Peter Sadowski2∗ 1 Computing and Mathematical Sciences, California Institute of Technology 2 Information and Computer Sciences, University of Hawai‘i at Mānoa 3 Ocean Resources and Engineering, University of Hawai‘i at Mānoa Abstract imaging to vital sea state information. Such reduced rep- resentations of high-dimensional data can be useful when The Sentinel-1 satellites equipped with synthetic aperture fitting statistical models to relatively small data sets, but radars (SAR) provide near global coverage of the world’s oceans every six days. We curate a data set of co-locations they are also limiting; task-relevant information is almost- between SAR and altimeter satellites, and investigate the use certainly lost when reducing a high-dimensional SAR image of deep learning to predict significant wave height from SAR. to the low-dimensional CWAVE feature space. While previous models for predicting geophysical quantities In this work, we attempt to extract additional informa- from SAR rely heavily on feature-engineering, our approach tion from SAR images using deep learning with artificial learns directly from low-level image cross-spectra. Training neural networks. Deep learning has proven to be an ex- on co-locations from 2015-2017, we demonstrate on test data tremely effective approach to representation learning, lead- from 2018 that deep learning reduces the state-of-the-art root ing to rapid advances in diverse fields such as computer vi- mean squared error by 50%, from 0.6 meters to 0.3 meters. sion (Krizhevsky, Sutskever, and Hinton 2014), high-energy physics (Baldi, Sadowski, and Whiteson 2014; Sadowski Introduction and Baldi 2018), and chemistry (Lusci, Pollastri, and Baldi Synthetic aperture radar (SAR) is an important remote sens- 2013; Duvenaud et al. 2015). Deep learning has the potential ing technology able to achieve high spatial resolution (< 10 to make similar advances in remote sensing for oceanogra- meter). From SAR satellite data, geophysical properties can phy by extracting information directly from SAR modula- be predicted using statistical models, enabling researchers to tion cross-spectra. monitor global sea states with unprecedented coverage, pre- cision, frequency, and without the use of complicated SAR modulation transfer functions (Schulz-Stellenfleth, König, and Lehner 2007). Sea state information provides scientific value in understanding the propagation of waves (Collard, Ardhuin, and Chapron 2009; Stopa et al. 2016) and the ef- fects of climate change (Young, Zieger, and Babanin 2011), as well as immediate practical benefits such as alerting ships to dangerously large waves created by storms. SARs capture sea surface roughness and many other geo- physical phenomena (Wang et al. 2019). Therefore, pre- dicting ocean wave signatures from SAR images typically requires feature engineering — a dimensionality-reduction step that extracts task-specific i nformation. C WAVE i s a common feature set for describing wave properties in SAR as a basis of 20 orthogonal features derived from the SAR modulation spectra. CWAVE has been used to estimate Figure 1: Spatial distribution of co-locations between the significant w ave h eight f or t he S ARs a board: 1 ) ERS- Sentinel-1 SAR satellites and altimeter satellites, in 2 × 2◦ 2 (Schulz-Stellenfleth, K önig, a nd L ehner 2 007), 2 ) EN- bins. VISAT (Li, Lehner, and Bruns 2011), and Sentinel-1 (Stopa and Mouche 2017; Pleskachevsky et al. 2019) linking SAR In this work, we first curate a data set of over 750,000 ∗ peter.sadowski@hawaii.edu co-locations of SAR and altimeter satellites, which provides Copyright c 2020, for this paper by its authors. Use permitted SAR in conjunction with direct measurements of ocean under Creative Commons License Attribution 4.0 International wave heights. The data is used to train deep neural networks (CCBY 4.0). Figure 2: Level 1 SAR image covering a square 20 × 20 km area (left); real component of the image spectra obtained by taking the 2D Fourier transform (center); and 20 orthogonal CWAVE basis functions designed to summarize the image spectra (right). The inputs to the deep neural network are the real and imaginary components of the image spectra, represented as two 72 × 60 matrices. to predict significant wave height, Hs , defined as the mean inary modulation spectra (Johnsen and Collard 2009) (Fig- of the top third of a wave height distribution. We compare ure 2). The modulation spectra consists of two matrices (real training on SAR image spectra vs. high-level CWAVE fea- and imaginary) of shape 72 × 60 with one dimension cor- tures, and analyze the effect of training data set size. responding to wavenumber and the other direction. These two matrices were then stacked to form the input tensor with Methods shape 72 × 60 × 2. The 1-Hz altimeter dataset estimates sig- nificant wave heights with spatial footprints of 6 to 10 km. Data The altimeter dataset consists of data merged from 6 differ- We curate a training set of historical data from two types ent altimeter missions and has been cross-calibrated between of polar-orbiting satellites: Sentinel-1 SAR satellites and al- platforms, as in Ribal and Young (Ribal and Young 2019). timeter satellites. Because the satellites are on different tra- The SAR image spectra were then pre-processed by center- jectories, their paths frequently intersect, providing mea- ing and scaling the real and imaginary image modulation surements from roughly the same location at the same time. spectra separately — each pixel was normalized by subtract- Specifically, the data set is constructed using measurements ing the overall mean and dividing by the overall standard that are less than 3 hours apart and with spatial differences deviation of all pixels and all co-locations. less than 300 km, resulting in 753,777 co-location events In addition to the SAR image spectra, we include a num- from 2015 through 2018 that are geographically well dis- ber of high-level features in our model. First, we include the tributed (Figure 1). These events have both SAR imaging time and distance between the satellite co-location measure- from Sentinel-1 and significant wave height from an altime- ments, normalized to have zero mean and unit variance — ter, and provide a high-fidelity reference data set (Ribal and while this information is only available during training. The Young 2019). time (or distance) between satellite observations provides a The data set is split into training, validation, and test sets rough estimate of how much we can trust the altimeter mea- based on year of data collection. Co-location events from surement to provide an accurate target because sea states can 2015 and 2016 were used as the training set, events from change faster than our original time and space constraints. 2017 was used as a validation set, and events from 2018 was These features are simply set to zero at prediction time. Sec- used as held-out test set. The result was 303,574 training ex- ond, the time-of-day was encoded as a value between -1 and amples, 265,052 validation examples, and 185,151 test ex- 1 using the function f (t) = 2sin( 2πt 48 ) − 1; this normaliza- amples. The validation set was used for learning rate anneal- tion helps stabilize the neural network optimization. Third, ing, early stopping, and hyper-parameter selection, while the latitude and longitude were encoded by representing each as test set was only used for the final evaluation of the model. an angle in the range [0, 2π) then taking the sine and cosine, The Sentinel-1 SAR data set consists of the real and imag- resulting in four features total. Fourth, a binary label was inary components computed from SAR modulation cross created to specify the SAR satellite; S1-A or S1-B are cali- spectra. Each data point within the cross spectra was cre- brated to produce comparable data, but there could be small ated by taking the Level 1 SAR image with 5 × 5 m pixel differences. Finally, we also include the 20 non-dimensional resolution covering a 20 × 20 km area and applying a 2D CWAVE parameters that are derived from the image spectra, Fourier transformation to different ”looks” within the dwell each normalized using standard scaling to have zero mean time (Engen and Johnsen 1995) to obtain the real and imag- and unit variance over the training examples. Output Input 72x60x2 35x29x64 16x13x128 7x5x256 Dense Dense 256 256 Dense Dense 256 128 1x32 Dense 256 Figure 3: Deep neural network architecture with two input types: SAR image spectra comprising one real and one imaginary channel (top), and 32 scalar-valued features (bottom). The SAR images are processed by multiple 2D convolution layers before the two branches of the network are combined by three dense layers at the output. Deep Learning • 1 time difference between altimeter and sentinel measure- Deep Neural Network Architecture We propose a deep ments feature neural network architecture that predicts significant wave • 1 spatial difference between altimeter and sentinel mea- height by using the input data from SAR image spectra. The surements feature model starts as two branches with separate inputs: one which processes the spectral input and another which extracts in- • 1 normalized radar cross section σ 0 feature formation from the high-level features (Figure 3). The spec- • 1 normalized variance of radar cross section feature. tral input branch takes an input tensor of the shape (72, 60, 2) where the real and imaginary values of the Fourier transform These 32 high-level features are fed into 11 dense layers are stacked along the third axis, analogous to the ‘colors’ of with 256 ReLU hidden units each. Both branches yield a flat- an RGB image. This input tensor is then fed sequentially tened array of size 256 which are then concatenated to form into three convolutional layers containing 64, 128, and 256 a single vector with 512 features. Two hidden dense layers filters respectively. A filter size of 3×3 is maintained at each of 256 and 128 hidden ReLU units then integrate the image convolutional layer with a rectified linear unit (ReLU) acti- spectra branch with the second branch. Finally, an output vation. In addition, each layer is followed by a max pooling layer with a dropout of 0.337 and softplus activation makes layer with a 2 × 2 window. The final convolutional layer is the final prediction. fed into a global max-pooling layer which produces a flat- This model is trained to minimize the mean squared er- tened array of size 256. This is then fed into two additional ror (MSE) using the Adam optimizer (Kingma and Ba 2014) dense layers with 256 hidden units each with ReLU activa- with a batch size of 128 and an initial learning rate of 0.0003. tion. The non-spectral data branch consists an input layer of The learning rate was decreased by 20% if the validation the following 32 features: set MSE did not improve over 4 epochs, and training was stopped when the validation set MSE did not improve over • 20 CWAVE features 10 epochs. The best model was trained for 35 epochs. The • 1 time of day feature dropout rate, initial learning rate, and batch size were opti- • 2 latitude features (sine and cosine) mized using the SHERPA black-box optimization package for machine learning hyper-parameter tuning (Hertel et al. • 2 longitude features (sine and cosine) 2018) on a cluster Nvidia RTX 2080 Ti GPUs. One hundred • 1 incidence angle feature models were trained using the random search algorithm to • 1 incidence angle mode feature (binary flag representing optimize over the search space shown in Table 1. WV1 or WV2) CWAVE Models To measure the advantage of the deep • 1 satellite source feature (binary flag representing learning approach over simpler models, we also trained two Sentinel-1A and Sentinel-1B) models that predict the significant wave height from the 32 were trained with an identical DNN architecture and hyper- Table 1: Hyper-parameter Search Space parameters, but with specific high-level features removed: Parameter Range the 20 CWAVE parameters removed or the latitude and lon- Batch Size {128, 256, 512, 1024} gitude features. In both models, time of day, incidence an- Learning Rate [0.0001, 0.001] gle, incidence angle mode, satellite type, time and distance Dropout [0.2, 0.5] difference between altimeter and sentinel data, normalized radar cross section σ 0 and normalized variance of radar cross section are still included. The results show that the high-level CWAVE features are still used by the model, but high-level features alone: a simple linear regression model only slightly — despite containing no additional informa- and a deep neural network. The neural network consisted of tion, these features add implicit bias to the model. The lo- eleven dense hidden layers of 256 ReLU units, followed by cation features do contain additional information — they a layer of 64 ReLU units, and two outputs. The two out- essentially allow the model to learn a prior over the wave puts correspond to a heteroskedastic Gaussian distribution heights at different locations — but these too only have a N (y1 , y2 ), where y2 is restricted to ensure positive variance small effect on performance. by defining a custom activation function:   y2 y2 > 0, Table 3: Feature Importance Study y2 = 1 y ≤0 1−y2 2 Weights are initialized using the scaling suggested by (He No Percentage et al. 2015), and the conditional negative log-likelihood of Wave No Lati- All of the target values is minimized using the Adam optimizer Height CWAVE tude Included Total Data (Kingma and Ba 2014) with mini-batches of size 1024. The Longitude initial learning rate of 0.003 decays starting at epoch 300 at All Waves 0.334 0.329 0.327 100% a decay rate of 0.0005 applied at the end of each subsequent <1m 0.439 0.421 0.392 1.4% epoch. A dropout rate of 0.5 is applied to the penultimate 1m - 3m 0.263 0.255 0.255 66.4% layer. Training is stopped when the validation loss doesn’t 3m - 8m 0.432 0.429 0.426 31.8% improve after 15 epochs. The architecture, learning rate, and >8m 1.145 1.187 1.216 0.4% early-stopping were optimized with SHERPA. Results Finally, we explore the impact of increasing the size of the training set on the discrepancy between including and To compare the three types of models after hyper-parameter not including CWAVE parameters in our final model. In tuning, we trained each on data from 2015-2016, and tested this experiment, we fix the hyper-parameters, train on data on events from 2018, enabling us to explore the relative ben- from 2015-2017 (568,626 examples), then test on 2018. The efits of deep learning and the use of image spectra features. models are trained for a fixed 30 epochs where the learn- Table 2 shows that the deep neural network trained on image ing rate is annealed by a factor of 0.4 every 10 epochs. Ini- spectra achieves a significantly lower root mean squared er- tial learning rate and dropout are identical to that of our ror (RMSE) of 0.33 meters compared to the other methods optimal deep neural network architecture. Figure 4 shows that rely only on the high-level features: 0.64 m for the lin- the mean performance of six randomly-initialized networks ear model and 0.43 m for the deep neural network trained on trained with different fractions of the data set. An ensem- CWAVE alone. Furthermore, this performance improvement ble (arithmetic mean) of the 6 models using all features and is consistent across small, medium, and large waves. the complete training set gives a test RMSE of 0.307 — a 50% reduction in RMSE from the previous state-of-the-art of 0.6 m (Stopa and Mouche 2017). This also approaches Table 2: Root Mean Squared Error on Test Set the RMSE of satellite altimetry compared to buoy observa- tions (Ribal and Young 2019). Percentage Wave CWAVE CWAVE Deep of Discussion Height Linear NN NN Total Data All Waves 0.642 0.433 0.327 100% Our results demonstrate that a deep convolutional neural <1m 0.827 0.443 0.392 1.4% network can extract useful representations from SAR im- 1m - 3m 0.515 0.377 0.255 66.4% age spectra that is not captured by engineered CWAVE fea- 3m - 8m 0.781 0.514 0.426 31.8% tures. In a direct comparison between two hyper-parameter >8m 3.226 1.512 1.216 0.4% optimized deep neural networks, the network with the im- age spectra information obtained a 29% reduction in RMSE (0.33 m vs. 0.43 m). This is in keeping with the success of A feature importance study (Table 2) shows the de- deep learning in other fields, where the expertly engineered pendence on each set of features. Two additional models features are discarded in favor of learned features. mance Center (4000107360/12/I-LG) and Sentinel-1 Ocean Study (S1-4SCI-16-0002). The altimetry data was sourced from the Integrated Marine Observing System (IMOS). All Sentinel-1 L2 data used in this study can be obtained from the Copernicus Data Hub (cophub.copernicus.eu). The au- thors would like to thank NVIDIA for a hardware grant to PS. The technical support and advanced computing re- sources from the University of Hawai‘i Information Tech- nology Services Cyberinfrastructure are gratefully acknowl- edged. References Baldi, P.; Sadowski, P.; and Whiteson, D. 2014. Searching for exotic particles in high-energy physics with deep learn- ing. Nature Communications 5. Figure 4: Performance of with both SAR image spectra and Collard, F.; Ardhuin, F.; and Chapron, B. 2009. Monitoring CWAVE parameters (blue line) and only SAR image spec- and analysis of ocean swell fields from space: New methods tra (orange curve), while varying the training set size. Each for routine observations. Journal of Geophysical Research point is an average of MSE objective over 6 trials with dif- 114(C7). ferent random weight initializations. Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; and Adams, R. P. 2015. Convolutional networks on graphs for learning molecular Our results show that there is still some advantage to in- fingerprints. In Advances in Neural Information Processing cluding the CWAVE features in the model, even with a train- Systems, 2215–2223. ing set of over 500,000 examples. However, we also show that this advantage diminishes as the number of training ex- Engen, G., and Johnsen, H. 1995. SAR-ocean wave in- amples increases (Figure 4). The CWAVE features, being version using image cross spectra. IEEE Transactions on derived from the image data, provide no additional infor- Geoscience and Remote Sensing 33(4):1047–1056. mation, but they help bias the model towards a good solu- He, K.; Zhang, X.; Ren, S.; and Sun, J. 2015. Delving deep tion. As the training data set increases, the model is slower into rectifiers: Surpassing human-level performance on im- to overfit, and the benefit of including the CWAVE features agenet classification. In Proceedings of the IEEE Interna- disappears. tional Conference on Computer Vision, 1026–1034. Latitude and longitude information is useful for predict- Hertel, L.; Collado, J.; Sadowski, P.; and Baldi, P. 2018. ing significant wave heights because there are regional char- Sherpa: Hyperparameter optimization for machine learning acteristics of the wave climate (Stopa et al. 2013). How- models. ever, the feature importance study shows that our model only makes minimal use of this information. This is encouraging, Johnsen, H., and Collard, F. 2009. Sentinel-1 ocean swell because it implies that the model is relying almost entirely wave spectra (osw) algorithm definition. Technical report, on the direct measurements rather than geographical infor- Northern Research Institute. mation. Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Conclusion Krizhevsky, A.; Sutskever, I.; and Hinton, G. 2014. Ima- Our results demonstrate that deep learning provides a 50% genet classification with deep convolutional neural. In Neu- decrease in RMSE compared to the previous state-of-the- ral Information Processing Systems, 1–9. art in predicting significant wave height from SAR. Instead of relying on the set of engineered CWAVE features that Li, X.-M.; Lehner, S.; and Bruns, T. 2011. Ocean wave inte- capture most of the discriminative information, our deep gral parameter measurements using envisat asar wave mode learning approach learns directly from the low-level, high- data. IEEE Transactions on Geoscience and Remote Sensing dimensional image spectra. Furthermore, our results indicate 49(1):155–174. that there is still room for improvement with additional train- Lusci, A.; Pollastri, G.; and Baldi, P. 2013. Deep architec- ing data. Thus, we should we should expect the performance tures and deep learning in chemoinformatics: the prediction of our model to increase as more co-location events are col- of aqueous solubility for drug-like molecules. Journal of lected over the next couple years. Chemical Information and Modeling 53(7):1563–1575. Pleskachevsky, A.; Jacobsen, S.; Tings, B.; and Schwarz, E. Acknowledgments 2019. Estimation of sea state from sentinel-1 synthetic aper- This work was made possible thanks to SAR data access ture radar imagery for maritime situation awareness. Inter- granted by ESA projects: Sentinel-1 A Mission Perfor- national Journal of Remote Sensing 40(11):4104–4142. Ribal, A., and Young, I. R. 2019. 33 years of globally cal- ibrated wave height and wind speed data based on altimeter observations. Scientific Data 6(1). Sadowski, P., and Baldi, P. 2018. Deep learning in the natu- ral sciences: applications to physics. In Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Springer. 269–297. Schulz-Stellenfleth, J.; König, T.; and Lehner, S. 2007. An empirical approach for the retrieval of integral ocean wave parameters from synthetic aperture radar data. Journal of Geophysical Research 112(C3). Stopa, J. E., and Mouche, A. 2017. Significant wave heights from sentinel-1 sar: Validation and applications. Journal of Geophysical Research: Oceans 122(3):1827–1848. Stopa, J. E.; Cheung, K. F.; Tolman, H. L.; and Chawla, A. 2013. Patterns and cycles in the climate forecast system re- analysis wind and wave data. Ocean Modelling 70:207–220. Stopa, J. E.; Ardhuin, F.; Husson, R.; Jiang, H.; Chapron, B.; and Collard, F. 2016. Swell dissipation from 10 years of envisat advanced synthetic aperture radar in wave mode. Geophysical Research Letters 43(7):3423–3430. Wang, C.; Mouche, A.; Tandeo, P.; Stopa, E. J.; Longepe, N.; Erhard, G.; Foster, R.; Vandemark, D.; and Chapron, B. 2019. A labeled Ocean SAR Imagery Dataset of Ten Geophysical Phenomena from Sentinel-1 Wave Mode. Geo- science Data Journal. Young, I. R.; Zieger, S.; and Babanin, A. V. 2011. Global trends in wind speed and wave height. Science 332(6028):451–455.