Convolutional Neural Networks for Feature Extraction and Automated Target Recognition in Synthetic Aperture Radar Images John Geldmacher, Christopher Yerkes, and Ying Zhao, Memeber, IEEE Abstract—Advances in the development of deep neural networks a synthetic aperture radar (SAR) sensor. Both IR and SAR and other machine learning algorithms combined with ever more images require a trained imagery analyst to reliably identify powerful hardware and the huge amount of data available on the targets. A repetitive and time consuming task that currently internet has led to a revolution in ML research and applications. These advances present massive potential and opportunity for military requires human expertise, importantly and creatively, is an applications such as the analysis of Synthetic Aperture Radar (SAR) ideal problem for deep learning. Automated target recognition imagery. SAR imagery is a useful tool capable of capturing high (ATR) seeks to reduce the total work load of analysts so that resolution images regardless of cloud coverage and at night. However, their effort can be spent on the more human-centric tasks like there is a limited amount of publicly available SAR data to train presenting and explaining intelligence to a decision maker. a machine learning model. This paper shows how to successfully dissect, modify, and re-architect cross-domain object recognition ATR is also intended to reduce the time from collection to models such as the VGG-16 model, transfer learning models from exploitation by screening images at machine speeds rather than the ImageNet, and the k-nearest neighbor (kNN) classifier. The paper manually. SAR ATR is complicated by the available data to demonstrates that the combinations of these factors can significantly train and assess machine learning models. Unlike other image and effectively improve the automated object recognition (ATR) classification tasks, there is not a large and freely available of SAR clean and noisy images. The paper shows a potentially inexpensive, accurate, transfer and unsurpervised learning SAR ATR amount of training data for researchers. Further, the data that system when data labels are scarce and data are noisy, simplifying is publicly available only covers a small fraction of the types the whole recognition for the tactical operation requirements in the of targets an effective SAR ATR system would be required to area of SAR ATR. identify. Keywords—k-nearest neighbor, kNN, deep learning, Synthetic Aperture Radar images, SAR images, transfer learning, VGG-16 II. A DVANTAGES AND C HALLENGES OF S YNTHETIC A PERTURE R ADAR (SAR) I MAGES , DATA D ESCRIPTION , I. I NTRODUCTION AND R ELATED W ORK The analysis and classification of targets within imagery Synthetic Aperture Radar (SAR) is a radar mounted to a captured by aerial and space-based systems provides the US moving platform that uses the platform’s motion to approx- intelligence community and military geospatial intelligence imate the effect of a large antenna. The high resolution that (GEOINT) personnel with important insights into adversary can be achieved by creating a radar with an effective aperture force dispositions and intentions. It has also entered the much greater in size than is physically possible allows for mainstream thanks to openly available tools like Google Earth. radar returns to be processed into images similar to what can The high resolution of space-based sensors and common be achieved with a camera [19]. SAR imagery provides an im- use of overhead imagery in everyday life means with the portant tool for the United States Intelligence Community and exception of decoys and camouflage, an average person is military geospatial intelligence (GEOINT) personnel because now reasonably capable of identifying objects in electro- of its all-weather, day/night collection capability. Additionally, optical (EO) imagery. EO images are, however, limited by some wavelengths that SAR imaging systems operate in have cloud coverage and daylight. About half of the time when a degree of foliage and ground penetrating capability allowing a satellite in low earth orbit could image a target it will for the detection of buried objects or objects under tree cover be night, necessitating the use of either an infrared (IR) or that would not be observable by other sensors such as EO sensors. This will certify that all author(s) of the above article/paper are employees These important advantages of SAR imaging for GEOINT of the U.S. Government and performed this work as part of their employment, and that the article/paper is therefore not subject to U.S. copyright protection. analysts do come with some significant drawbacks inherent to No copyright. Use permitted under Creative Commons License Attribution SAR images. Because SAR images are not true optical images, 4.0 International (CC BY 4.0). In: Proceedings of AAAI Symposium on they are susceptible to noise generated by constructive and the 2nd Workshop on Deep Models and Artificial Intelligence for Defense Applications: Potentials, Theories, Practices, Tools, and Risks, November 11- destructive interference between radar reflections that appear 12, 2020, Virtual, published at http://ceur-ws.org/ as bright or dark spots called “speckle” in the image [19]. Also J. Geldmacher, C. Yerkes, and Y. Zhao are with the Department of various materials and geometries will reflect the radar pulses Information Sciences, Naval Postgraduate School, Monterey, CA, USA C. Yerkes is also with Oettinger School of Science and Technology differently, creating blobs or blurs that can obscure the objects Intelligence, National Intelligence University, Bethesda, MD, USA physical dimensions. These issues, as well as problems caused images with the breakthrough feature extraction layers as demonstrated in convolutional neural networks, produced generally good results. An SVM method proposed by [27] achieved 91% accuracy in a five-class test [27], while a Bayesian classifier reported a 95.05% accuracy in a 10-classes test [13]. In recent years, the work on classification of SAR imagery has focused on the use of CNNs. In 2015, Morgan showed that a relatively small CNN could achieve 92.1% accuracy across the 10-class of the MSTAR dataset, roughly in line with the shallow methods previously explored. Morgan’s method also showed that a network trained on nine of the MSTAR target classes could be retrained to include a tenth class 10-20 times faster than training a 10-class classifier from scratch. The ability to more easily adapt the model to changes in target sets represents an advantage over shallow classification Fig. 1. Example Photographs and MSTAR Im- techniques [11]. This is especially valuable in a military ages by Class. Photograph of BMP-2 from https : ATR context given the fluid nature of military operations, //www.militaryf actory.com/armor/detail.asp?armor id = 50, where changes to the order of battle may necessitate updating photograph of BTR-70 from https : //military.wikia.org/wiki/BT R − 70. All other photographs and SAR images adapted from the MSTAR a deployed ATR system. Malmgren-Hansen et al., explored dataset. transfer learning from a CNN pre-trained on simulated SAR images generated by using ray tracing software and detailed computer aided design models of target systems. They showed by Doppler shift in moving objects and radar shadows, make that model performance was improved, especially in cases the identification and classification of objects in SAR images where the amount of training data was reduced [10]. The a difficult and tedious task that also requires a well-trained technique of generating simulated SAR images for training and experienced analyst. could also be valuable in a military SAR ATR context where The the Moving and Stationary Target Acquisition and an insufficient amount of training data for some systems may Recognition (MSTAR) data set is a publicly available data set exist. consisting of synthetic aperture radar images of the following The use of a linear SVM as a replacement for the softmax 10 classes of military vehicles: activation that is typically used for multiclass classifiers in 1) 2S1: former Soviet Union (FSU) self-propelled artillery neural networks has been shown to be potentially more effec- 2) BMP-2: FSU infantry fighting vehicle tive for some classification tasks [22]. Transfer learning from 3) BRDM-2: FSU armored reconnaissance vehicle ImageNet to MSTAR with an SVM classifier was explored by 4) BTR-60: FSU armored personnel carrier [1] in 2018. Their methodology compared the performance of 5) BTR-70: FSU armored personnel carrier an SVM classifier trained on mid-level feature data extracted 6) D7: Caterpillar tractor frequently used in combat engi- from multiple layers from AlexNet, GoogLeNet, and VGG- neering roles 16 neural networks without retraining the feature extracting 7) T-62: FSU main battle tank network. Although they reported over 99% accuracy when 8) T-72: FSU main battle tank classifying features extracted from mid-level convolutional 9) ZIL-131: FSU general purpose 6x6 truck layers from AlexNet, performance of the SVM on features 10) ZSU-23-4: FSU self-propelled anti-aircraft gun from fully-connected layers did not achieve 80% accuracy. The 11) SLICY: the Sandia Laboratories implementation of best performance reported on from the VGG-16 architecture cylinders (SLICY). The SLICY consisting of simple was 92.3% from a mid-level convolutional layer, but only geometric shapes such as cylinders, edge reflectors, and 49.2% and 46.3% from features extracted in the last two fully- corner reflectors which could be used for calibration connected layers [1]. of sensors or for modeling the propagation of radar reflections. III. T RANSFER L EARNING AND F EATURE E XTRACTION Fig. 1 shows the example photographs and MSTAR images by class. It demonstrates the difficulties an imagery analyst CNNs require a very large amount of data to train an would face when identifying targets in SAR imagery. The accurate model and it is not uncommon for data sets with vehicles that are easily recognizable in photos become blurs in tens or even hundreds of thousands of images to be used SAR images. Due to its public availability and ease of access when training a model. Transfer learning presents one possible for researchers, the data set has become the standard for SAR solution when training a CNN on a limited data set by image Automated Target Recognition (ATR) classification leveraging knowledge from a previously learned source task to research. aid in learning a new target task [14]. In an image classification ATR in SAR imagery using “shallow” classification meth- problem, transfer learning works by training a CNN on a data ods, which are traditional classifiers applied directly to SAR set that has a very large number of images and freezing the parameters for a certain number of layers and extracting mid- level feature representations before training further layers and the final classification layer [7]. ImageNet is an open source labeled image database orga- nized in a branching hierarchical method of “synonym sets” or “synsets”. For example, the “tank” synset is found in a tree going from vehicle to wheeled vehicle to self-propelled vehicle to armored vehicle to tank. The ImageNet database Fig. 2. The Original VGG-16 Architecture consists of over 14 million labeled images organized into over 21,000 synsets. Pre-trained ImageNets are often used in transfer learning. Transfer learning is typically used when source and target tasks are not too dissimilar in order to avoid negative transfer. Negative transfer occurs when the features learned in the transfer learning method actually handicap the model perfor- mance [14]. However, transfer learning becomes more useful when a curious phenomenon that many deep neural networks trained on natural images learn similar features across images from different domains. Fig. 3. VGG-16 with Transfer learning by freezing the first six layers where the weights taken directly from the ImageNet Evidence shows that low and mid-level features could represent basic ATR features in images such as texture, corners, edges, and color blobs [9], and the low and mid-level two convolutional/pooling blocks frozen for training to take neural network feature extraction function resembles the actual advantage of the broad feature detection of the pre-trained biological and human neurons’ function. Low and mid-level network as shown in Fig. 3. Our method, as in Fig. 4, shows of features extracted from CNNs are likely common across that the dense layer of 1024 features extracted are saved and even dissimilar data sets. A transfer learning approach between used as the input to a shallow classifier. different domains is feasible and ATR tasks are evidently In our experiment, the standard VGG-16 model is imple- successful in cross-domain applications [2], [8]. For example, mented in the Keras application program interface (API) with the application of transfer learning to remote sensing target TensorFlow as the backend. The ImageNet weights available detection and classification was studied [16], which showed in the Keras are ported from the Visual Geometry Group at that a CNN classifier trained on a photographic data set could Oxford University that developed the VGG architecture for be retrained to perform remote sensing classification of ships ILSVRC-2014 localization and classification tasks [18]. We at sea with a good performance. also use, Orange, which is an open source data science and machine learning toolkit that allows users to easily manipulate IV. SAR ATR: F EATURE E XTRACTION C OMBINED WITH data through a graphical user interface. Orange has several S HALLOW C LASSIFIERS built-in machine learning algorithms and simplifies the data A. Multistep Classifier management and pre-processing requirements to allow users to experiment with approaches to machine learning and data In practice and in cross-domain applications, very few science [3]. people train an entire CNN from scratch because it is relatively A CNN is trained on the 2200 training images with a 20% rare to have a data set of sufficient size. For this reason, trans- validation split. The training and test data were both then run fer learning with feature extraction combined with shallow through the retrained neural network. The last fully connected classifiers are suitable choices for SAR images. layer before the neural network’s output was saved as a 1024- The network architecture employed in this paper was a dimensional vector for each image as shown in Fig. 3. modified VGG-16 architecture [18]. The original VGG-16 ar- The extracted features run through the Orange workflow are chitecture is shown in Fig. 2. It consists of two or three linked pictured in Fig. 5. The precision and recall are used to compare convolutional/pooling blocks, three fully-connected layers, and the base CNN performance, kNN, SVM, and random forest a softmax activation in the end to determine the class label. The network employs a 3x3 kernel and a stride of one so that each pixel is the center of a convolutional step. The architecture has been modified to freeze model weights for the first two convolutional/pooling blocks (e.g., the first six layers). The model top has also been replaced with a fully- connected layer, a dropout layer to mitigate overfitting, and two final fully-connected layers with a softmax activation for classification [16]. This is also referred as a modified VGG- 16 architecture or a VGG-16 architecture in this paper. It Fig. 4. The multistep classifier: Extract features from the VGG-16 and then was initialized with the ImageNet weights and had the first apply a shallow classifier Fig. 7. Examples of Noisy SAR images Fig. 5. The Orange Workflow Fig. 8. Comparison of VGG-16 with the multistep classifiers for added noise Fig. 6. Comparison of VGG-16 with the multistep classifiers classifiers. For the kNN classifier, k was set to 11 and weighted perturbation or adversarials’ deliberated manipulations [6], Euclidian distance was used to determine which class label to [21]. To study the effect, random Gaussian noise with a noise assign to test images. A sigmoid kernel was used in the SVM factor of 0.2 was added to the images from the data set. classifier, and the random forest consisted of 10 decision trees. Fig. 7 shows an example of an original SAR image with one These settings were unchanged in experiments two and three. added noise. The feature extraction from CNN and follow on shallow classification process was then repeated without retraining the base model in order to test the robustness of the B. Results model. The baseline model (i.e., modified VGG-16 model) was The baseline model, which is the modified VGG-16, is then retrained on the noisy images for 30 epochs and accuracy shown in Fig. 3. The modified VGG-16 without transfer was compared. learning and trained exclusively on MSTAR, resulted in an Neither the neural network nor any of the multistep classi- average precision and recall of 0.96. The same modified VGG- fiers proved robust enough to handle the addition of random 16 with full transfer learning of the convolutional layers with noise to the images. However, after retraining the kNN and weights from ImageNet resulted in an average precision and SVM multistep classifiers perform better than the modified recall of 0.88. Although, the transfer learning approach has VGG-16 with partial transfer learning. the advantage of converging much more quickly than the CNN initialized with random weights, the full transfer learning of all convolutional weights and only retraining the CNN top V. D ISCUSSION did not match the performance of the non-transfer learning Performance on the SLICY class is of interest because it approach, suggesting some negative transfer occurs in the later demonstrates the model’s ability to discriminate an invalid convolutional layers. As shown in Fig. 6, the best performed target from a valid target. All other classes, with the exception is the modified VGG-16 with partial transfer learning in Fig. 3 of the D7, are former Soviet Union military equipment. The and resulted an average precision and recall of 0.98. The D7 is a Caterpillar bulldozer. Up-armored versions of the D7 multistep classifier using a kNN classifier in Fig. 4 was able to and related equipment are often used in combat engineering match the best baseline performance with an average precision roles. In a military context this means they are likely to be a and call of 0.98, while the SVM and random forest classifiers valid target. As demonstrated by the high precision and recall fell short of the baseline model’s performance. in this class across all models, valid targets are very rarely classified as a SLICY (high precision) and the random objects C. Adding Noise are not being accepted as valid targets (high recall). As described before, ATR of SAR images are typically The performance of the kNN classifier is also notable since sensitive to the noise in the images. A CNN is known to the use of an SVM for classification after feature extraction is be vulnerable to the noisy models both from environmental previously studied; however, little research has been done on ple) and no supervised learning or no class labels required approach for SAR ATR. Recently, various learning-to-hash algorithms [24] are used to approximate the exact nearest neighbors, which translates a supervised learning problem and kNN into an index/search problem [15], and simplifies the whole recognition for the tactical operation requirements in the area of SAR ATR. If there are no class labels of SAR available, our multistep classifiers with transfer learning and kNN can provide an unsupervised classification with a high accuracy and confidence to match an object which looks like another object seen before. Fig. 10 also shows the comparison of kNN with other supervised learning methods. The kNN method is the best among all the methods for the average precision and recall, where classification of the SLICY has a precision of 0.98 and a recall of 1. The future work is to test on more and different data sets (e.g., EO and IR data) to validate if the multistep methods can apply to cross-domain ATR problems. Fig. 9. VGG-16 TensorFlow architecture layout VI. C ONCLUSION Cross-domain transfer learning from photographs to SAR imagery is effective for training a neural network both for feature extraction and classification. A retrained neural net- work can function as an efficient feature extractor for training a shallow classifier. kNN and SVM classifiers are potentially useful replacement for softmax activation in a neural network. Multistep classification methods using a shallow classifier trained on features extracted from a neural network, outper- formed the base neural network when tested on noisy data and as the amount of training data decreases. This is valuable to improve CNN in a broader machine vision community by applying feature extraction followed by shallow classifiers for clean and noisy images. Transfer learning and kNN multistep classification methods could be significant in terms of setting up a robust image indexing system with minimum supervised training and learning required. Fig. 10. Multistep classifiers results: VGG-6 transfer learning features + k-means (k=2048) + kNN & other methods in Orange Currently, the analysis community does not have an estab- lished standard for the percent of correctly identified targets by an imagery analyst. Instead the analysis relies on the user’s experience and confidence in their own work, providing the performance of kNN rather than a softmax activation for responses such as “possible main battle tank” or “likely BMP- neural network output. 2”, and thus a direct comparison to expert-level performance To further explore the relations of feature extraction, transfer is difficult to establish. Both the baseline model employing learning, and kNN, we ran an additional experiment where transfer learning and the shallow classifiers using a neural we first extracted the transfer weights of the first six layers network as a feature extractor performed with a high degree of the VGG-16 architecture from ImageNet. Since the flatten of accuracy and would be valuable in an operational context dimension is 32x32x128=131,072, as shown in Fig. 9, we as an aid to GEOINT analysts. applied the unsupervised learning k-means algorithm to group the 131,072 dimension into 2048 clusters. The reasoning here is that the first six layers probably embed the best features (texture, corners, edges, and color blobs) that can be used for classification. Finally, We performed kNN and other supervised learning methods in Orange based on the 2048 dimensional train and test data. Fig. 10 shows the test data results from Orange for the VGG6-transfer-kmeans-kNN method with an average precision and recall of 0.93. The six layers of transfer learning together with k-means and kNN provide an inexpensive (without GPU or AWS, for exam- R EFERENCES [20] Stewart, M. (2019, February 26). Simple Introduction to Convolutional Neural Networks. Towards Data Science. https://towardsdatascience.com/simple-introduction-to-convolutional- [1] Al Mufti, M., Al Hadhrami, E., Taha, B., & Werghi, N. (2018). Au- neural-networks-cdf8d3077bac tomatic target recognition in SAR images: Comparison between pre- [21] Sitawarin,C. & Wagner, D. (2019). On the Robustness of Deep K- trained CNNs in a tranfer learning based approach. 2018 Interna- Nearest Neighbors. arXiv:1903.08333v1 tional Conference on Artificial Intelligence and Big Data (ICAIBD). [22] Tang, Y. (2013). Deep Learning using Linear Support Vec- https://doi.org/10.1109/ICAIBD.2018.8396186 tor Machines. 2013 ICML Challenges in Representation Learning. [2] Chen, D., Liu, S., Kingsbury, P., Sohn, S., Storlie, C. B., Habermann, E. https://arxiv.org/abs/1306.0239 B., Naessens,J. M., Larson, D. W. & Liu, H. (2019). Deep learning and [23] Timothy D. Ross, Jeff J. Bradley, Lannie J. Hudson, & Michael P. alternative learning strategies for retrospective real-world clinical data. O’Connor. (1999). SAR ATR: so what’s the problem? An MSTAR Digital Medicine (2019)2:43 ; https://doi.org/10.1038/s41746-019-0122- perspective. 3721. https://doi.org/10.1117/12.357681 0. [24] Wang, J., Zhang, T., Song, J., Sebe, N., & Shen, H. T. (2017). A [3] Demšar, J., Zupan, B., Leban, G., & Curk, T. (2004). Orange: From Survey on Learning to Hash. IEEE TRANSACTIONS ON PATTERN Experimental Machine Learning to Interactive Data Mining. In Knowl- ANALYSIS AND MACHINE INTELLIGENCE.(2017)13:9. edge Discovery in Databases: PKDD 2004 (Vol. 3202). Springer. [25] Glorot,X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Recti- https://doi.org/10.1007/978-3-540-30116-5 58 fier Neural Networks. In Geoffrey Gordon, David Dunson, Miroslav [4] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, Dudı́k (Eds.), Proceedings of the Fourteenth International Confer- L. (2009). ImageNet: A large-scale hierarchical image database. ence on Artificial Intelligence and Statistics (pp. 315–323). PMLR. 2009 IEEE Conference on Computer Vision and Pattern Recogition. http://proceedings.mlr.press/v15/glorot11a.html https://doi.org/10.1109/CVPR.2009.5206848 [26] Yiu, T. (2019, June 12). Understanding Random Forest: How [5] Gandhi, R. (2018, June 7). Support Vector Machine—Introduction the Algorithm Works and Why it Is So Effective. Towards Data to Machine Learning Algorithms. Towards Data Science. Science. https://towardsdatascience.com/understanding-random-forest- https://towardsdatascience.com/support-vector-machine-introduction- 58381e0602d2f3c8 to-machine-learning-algorithms-934a444fca47 [27] Zhao, Q., & Principe, J. (2001). Support vector machines for SAR auto- [6] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The matic target recognition. IEEE Transactions on Aerospace and Electronic MIT Press. ISBN: 0262035618 Systems, 37(2), 643–654. [7] Kang, C., & He, C. (2016). SAR image classification based on the multi-layer network and transfer learning of mid-level representations. 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS. https://doi.org/10.1109/IGARSS.2016.7729290 [8] Khosravi, P., Kazemi, E., Zhan, Q., Malmsten, J. E., Toschi, M., Zisi- mopoulos,P., Sigaras, A., Lavery, S., Cooper, L., Hickman, C., Meseguer, M., Rosenwaks, Z., Elemento, O., Zaninovic, N. & Hajirasouliha, I. (2019). Deep learning enables robust assessment and selection of hu- man blastocysts after in vitro fertilization,Digital Medicine (2019)2:21; https://doi.org/10.1038/s41746-019-0096-y [9] Liu,L., Chen,J., Fieguth,P., Zhao, G., Chellappa, R. & Pietikäinen, M. (2019). From BoW to CNN: Two Decades of Texture Representation for Texture Classification. International Journal of Computer Vision (2019)127:74–109 [10] Malmgren-Hansen, D., Kusk, A., Dall, J., Nielsen, A., Enghold, R., & Skriver, H. (2017). Improving SAR Automatic Target Recognition Models With Transfer Learning From Simulated Data. IEEE Geoscience and Remote Sensing Letters, 14(9), 1484–1488. [11] Morgan, D. (2015). Deep convolutional neural networks for ATR from SAR imagery. Algorithms for Synthetic Aperture Radar Imagery XXII, 9475. https://doi.org/10.1117/12.2176558 [12] Nielsen, M. (2015). Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com/ [13] O’Sullivan, J. A., DeVore, M. D., Kedia, V., & Millier, M. I. (2001). SAR ATR performance using a conditionally Gaussian model. IEEE Transactions on Aerospace and Electronic Systems, 37(1), 91–108. https://doi.org/10.1109/7.913670 [14] Pan, S., & Yang, Q. (2010). A Survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10). https://doi.org/10.1109/TKDE.2009.191 [15] Peng, T., Boxberg, M., Weichert, W., Navab, N. Marr, C. (2019). Multi-task learning of a deep k-nearest neighbour network for histopathological image classification and retrieval. bioRxiv doi: https://doi.org/10.1101/661454 [16] Rice, K. (2018). Convolutional Neural Networks For Detec- tion And Classification Of Maritime Vessels In Electro-Optical Satellite Imagery [Master’s thesis, Naval Postgraduate School]. http://hdl.handle.net/10945/61255 [17] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Kholsa, A., Bernstein, M., Berg, A., & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211–252. https://doi.org/10.1007/s11263-015-0816-y [18] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks For Large-Scale Image Recognition. International Conference on Learning Representations 2015. [19] Skolnik, M. (1981). Introduction to Radar Systems (Second). McGraw- Hill, Inc.