Clustering Techniques Versus Binary Thresholding for Detection of Signal Tracks in Ionograms Artem M. Grachev and Andrey Shiriy National Research University Higher School of Economics, Moscow, Russia amgrachev@hse.ru, andreyschiriy@gmail.com Abstract. An ionogram is a display of the data produced by an ionosonde. It is a graph of the virtual height of the ionosphere plotted against fre- quency. In addition to “useful signal”, an ionogram almost always contains noise of different nature, a so called background noise. That is why the signal filtering task becomes so important. There are two groups of meth- ods to this end. The first group features methods of computer vision for image processing, namely, different filters and image binarization. The second group includes adapted clustering methods. In this paper, we show how several methods work for filtering “useful signal” from noise and emissions. Keywords: ionograms, image filtering, image processing, similarity mea- sures 1 Introduction The data of radio sounding is necessary for enhancement of over-the-horizon radar systems, systems of shortwave communication, as well as for solution or many problems in radiophysics and geophysics [1]. Usually, the results obtained by an ionosonde are represented by means of ionograms[2]. An ionogram of oblique radio sounding of the ionosphere shows a dependence of the amplitude of the received signal from the frequency f of soudning and the group delay time τ [3]. Due to multipath shortwave propagation in the ionosphere, an ionogram contains tracks of different signal modes. In addition to the useful signal, there is a noise of different nature in ionogram images. In Fig. 1, one can see the mode of the signal’s track (a sloped body in the bottom left part of the ionogram), background noise, and concentrated noise, i.e. vertical stripes1 . When we work with ionograms one of the most important problem is to filter the useful signal from the noise. There are several types of useful signals. In fact, 1 The data of ionograms shown in the paper are available at https://drive.google. com/open?id=0Bxdto9RRxaqMY2pCYUI4eWR0T1U. More comprehensive datasets are available from the second co-author by request. 88 Artem Grachev and Andrey Shiriy Fig. 1. Ionogram example we have a problem similar to automatic classification or clusterization depending on the availability of training (labeled) data. The rest of the paper is organised as follows. In Section 2, we consider signal segmentation using image processing methods. In Section 3, we use machine learning methods for the same purposes. We treat an input image as a dataset with each pixel as a separate element and then cluster it. In Section 4, we try to exploit the best of these methods to create our final algorithm. In the conclusion, we discuss shortly relevant techniques and problems for future work. We should note that when we tested our methods, we tried several configura- tions for our models (sometimes enumerating parameters’ values by grid search). Of course, there may be better configurations of parameters in a particular case. Clustering Techniques Versus Binary Thresholding for Ionograms 89 2 Detection of signal tracks by image processing methods In this approach, we consider an ionogram as an image. We need to filter out the noise and isolate the signal track of an input ionogram. We have tested two filters for image filtering: the median filter and the filter given by the matrix below.   1110111 1 1 1 0 1 1 1   1 1 1 0 1 1 1 Ker =   1 1 1 0 1 1 1  1 1 1 0 1 1 1 1110111 In the next example, we show the original image and the results of application of two filters to the image and its binarization by thresholding. Image binarization is the way to define the class of each pixel as signal/background by thresholding. That is we set the threshold value of brightness and apply it to all the pixels; the pixels with brightness higher than this threshold belong to the first class, and the remaining ones belong to the second. In Fig. 2, the images of the original ionogram are shown in three color model. And, in the remaining figures, for illustration we use only one color model. It is clear that filtering with Ker matrix is able to better keep signal’s shape and eliminate the noise in comparison with the median filter. 3 Detection of signal tracks by machine learning methods Another approach is based on the ionogram representation in form of triples hx, y, V i for each original pixel, where x and y are pixel’s coordinates and V is the value of the pixel brightness. After such transformaiton we try to do clusterization. We hypothesise that signal’s pixels should belong to a separate cluster. This approach is similar to the well-known image segmentation methods that one can find, for example, in this book [4]. After clustering we again represent the results as an image. We replace the value of brightness of each input pixel by its cluster label. These three methods from scikit-learn machine learning environment [5] have been applied: 1. K-Means 2. DBscan [6] 3. Mean shift [7] The last two methods have been chosen since they do not need to know the number of clusters in advance; moreover, according to locality hypotheis they can capture both similarity in signal/noise values and spatial closeness in axes x-y (in fact, f -τ ). Dbscan have worked rather good visually. Main disadvantage of this method is a necessity to configure its parameters separately for each image. In Fig. 4, 90 Artem Grachev and Andrey Shiriy a) b) c) d) Fig. 2. Ionograms: a) the original image, b) preprocessing by the median filter, c) filtering with matrix Kers, d) binarization you can find the results of processing of the original ionogram given in Fig. 3 by DBscan with ε = 4 (the neighbourhood size), N = 100 (the number of points within the neighbourhood). Coordinates are scaled in the way below: xold yold xnew = , ynew = (1) max(xold ) max(yold ) Next example launched with ε = 1, N = 50 and with following coordinate transformation: xold yold xnew = · 10, ynew = · 10 (2) max(xold ) max(yold ) In the figures above, machine learning methods have been applied to the original image. However, we should note that we get better results if we first applied filtering and then clustering. Clustering Techniques Versus Binary Thresholding for Ionograms 91 Fig. 3. Original image Fig. 4. DBscan results Fig. 5. Original image Fig. 6. Filtered image Fig. 7. Mean shift results It turns out that the most appropriate method for this task is Mean shift, applied after image filtering. The Python implementation of Mean shift allows us to choose the Parzen’s window size automatically for each image. It depends on distance between objects; we have used 70th percentile of all pairwise dis- tances. This property of Mean shift is much more suitable in comparison to DBscan since DBscan needs individual options for each image. Another ad- vantage of Mean shift is its speed. Here we have also used coordinates trans- formation from Eq. 2. 92 Artem Grachev and Andrey Shiriy 4 Conclusion This paper presents the first steps of comparison of image processing and ma- chine learning techniques for signal detection in ionograms. Both groups of meth- ods are suitable for noise filtering and isolation of the original (important) signal. We have compared several methods of computer vision and machine learning for this problem. It seems that Mean shift works better than its two competitors in the conducted comparison. In the future we plan to apply deep learning meth- ods for better signal detection based on a large set of ionograms. The usage of autoencoder for automatic clustering of signal types is an attractive opportunity as well. Other image segmentation techniques that are widely used in computer vision community are highly relevant as well. References 1. Shiriy, A.: Development and modeling of algorithms for automatic measurement of the paramaters of inospheric shortwave radiolines. PhD thesis, Saint Peters- burg State University of Telecommunications after M.A. Bonch-Bruevich (2007) (In Russian). 2. Kolchev, A., Shumaev, V., Shiriy, A.: Equipment for Research of HF Ionospheric Multipath Propagation Effect. Journal of Instrument Engineering 51(12) (2008) 73–78 3. Williams, G.: Interpreting digital ionograms. RadCom (RSGB) 85(05) (2009) 44–46 4. Forsyth, D.A., Ponce, J.: Computer Vision - A Modern Approach, Second Edition. Pitman (2012) 5. Varoquaux, G., Buitinck, L., Louppe, G., Grisel, O., Pedregosa, F., Mueller, A.: Scikit-learn: Machine learning without learning the machinery. GetMobile 19(1) (2015) 29–33 6. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discov- ering clusters in large spatial databases with noise, AAAI Press (1996) 226–231 7. Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5) (May 2002) 603–619