<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering Techniques Versus Binary Thresholding for Detection of Signal Tracks in Ionograms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Artem M. Grachev</string-name>
          <email>amgrachev@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Shiriy</string-name>
          <email>andreyschiriy@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>87</fpage>
      <lpage>92</lpage>
      <abstract>
        <p>An ionogram is a display of the data produced by an ionosonde. It is a graph of the virtual height of the ionosphere plotted against frequency. In addition to “useful signal”, an ionogram almost always contains noise of different nature, a so called background noise. That is why the signal filtering task becomes so important. There are two groups of methods to this end. The first group features methods of computer vision for image processing, namely, different filters and image binarization. The second group includes adapted clustering methods. In this paper, we show how several methods work for filtering “useful signal” from noise and emissions.</p>
      </abstract>
      <kwd-group>
        <kwd>ionograms</kwd>
        <kwd>image filtering</kwd>
        <kwd>image processing</kwd>
        <kwd>similarity measures</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The data of radio sounding is necessary for enhancement of over-the-horizon
radar systems, systems of shortwave communication, as well as for solution or
many problems in radiophysics and geophysics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Usually, the results obtained by an ionosonde are represented by means of
ionograms[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. An ionogram of oblique radio sounding of the ionosphere shows
a dependence of the amplitude of the received signal from the frequency f of
soudning and the group delay time [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Due to multipath shortwave propagation in the ionosphere, an ionogram
contains tracks of different signal modes. In addition to the useful signal, there
is a noise of different nature in ionogram images. In Fig. 1, one can see the mode
of the signal’s track (a sloped body in the bottom left part of the ionogram),
background noise, and concentrated noise, i.e. vertical stripes1.</p>
      <p>When we work with ionograms one of the most important problem is to filter
the useful signal from the noise. There are several types of useful signals. In fact,
1 The data of ionograms shown in the paper are available at https://drive.google.
com/open?id=0Bxdto9RRxaqMY2pCYUI4eWR0T1U. More comprehensive datasets are
available from the second co-author by request.
we have a problem similar to automatic classification or clusterization depending
on the availability of training (labeled) data.</p>
      <p>The rest of the paper is organised as follows. In Section 2, we consider signal
segmentation using image processing methods. In Section 3, we use machine
learning methods for the same purposes. We treat an input image as a dataset
with each pixel as a separate element and then cluster it. In Section 4, we try to
exploit the best of these methods to create our final algorithm. In the conclusion,
we discuss shortly relevant techniques and problems for future work.</p>
      <p>We should note that when we tested our methods, we tried several
configurations for our models (sometimes enumerating parameters’ values by grid search).
Of course, there may be better configurations of parameters in a particular case.</p>
      <p>Detection of signal tracks by image processing methods
In this approach, we consider an ionogram as an image. We need to filter out
the noise and isolate the signal track of an input ionogram. We have tested two
filters for image filtering: the median filter and the filter given by the matrix
below.</p>
      <p>01 1 1 0 1 1 11</p>
      <p>
        B1 1 1 0 1 1 1C
Ker = BBBB11 11 11 00 11 11 11CCCC
Another approach is based on the ionogram representation in form of triples
hx; y; V i for each original pixel, where x and y are pixel’s coordinates and V
is the value of the pixel brightness. After such transformaiton we try to do
clusterization. We hypothesise that signal’s pixels should belong to a separate
cluster. This approach is similar to the well-known image segmentation methods
that one can find, for example, in this book [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        After clustering we again represent the results as an image. We replace the
value of brightness of each input pixel by its cluster label. These three methods
from scikit-learn machine learning environment [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have been applied:
1. K-Means
2. DBscan [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
3. Mean shift [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
      </p>
      <p>The last two methods have been chosen since they do not need to know the
number of clusters in advance; moreover, according to locality hypotheis they
can capture both similarity in signal/noise values and spatial closeness in axes
x-y (in fact, f - ).</p>
      <p>Dbscan have worked rather good visually. Main disadvantage of this method
is a necessity to configure its parameters separately for each image. In Fig. 4,</p>
      <p>Next example launched with " = 1, N = 50 and with following coordinate
transformation:
xnew =</p>
      <p>xold
max(xold)
10; ynew =</p>
      <p>yold
max(yold)
10</p>
      <p>In the figures above, machine learning methods have been applied to the
original image. However, we should note that we get better results if we first
applied filtering and then clustering.</p>
      <p>It turns out that the most appropriate method for this task is Mean shift,
applied after image filtering. The Python implementation of Mean shift allows
us to choose the Parzen’s window size automatically for each image. It depends
on distance between objects; we have used 70th percentile of all pairwise
distances. This property of Mean shift is much more suitable in comparison to
DBscan since DBscan needs individual options for each image. Another
advantage of Mean shift is its speed. Here we have also used coordinates
transformation from Eq. 2.</p>
      <p>Conclusion
This paper presents the first steps of comparison of image processing and
machine learning techniques for signal detection in ionograms. Both groups of
methods are suitable for noise filtering and isolation of the original (important) signal.
We have compared several methods of computer vision and machine learning for
this problem. It seems that Mean shift works better than its two competitors
in the conducted comparison. In the future we plan to apply deep learning
methods for better signal detection based on a large set of ionograms. The usage of
autoencoder for automatic clustering of signal types is an attractive opportunity
as well. Other image segmentation techniques that are widely used in computer
vision community are highly relevant as well.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Shiriy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Development and modeling of algorithms for automatic measurement of the paramaters of inospheric shortwave radiolines</article-title>
          .
          <source>PhD thesis</source>
          , Saint Petersburg State University of Telecommunications after M.
          <article-title>A</article-title>
          .
          <string-name>
            <surname>Bonch-Bruevich</surname>
          </string-name>
          (
          <year>2007</year>
          )
          <article-title>(In Russian)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kolchev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shumaev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shiriy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Equipment for Research of HF Ionospheric Multipath Propagation Effect</article-title>
          .
          <source>Journal of Instrument Engineering</source>
          <volume>51</volume>
          (
          <issue>12</issue>
          ) (
          <year>2008</year>
          )
          <fpage>73</fpage>
          -
          <lpage>78</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Interpreting digital ionograms</article-title>
          .
          <source>RadCom</source>
          (RSGB)
          <volume>85</volume>
          (
          <issue>05</issue>
          ) (
          <year>2009</year>
          )
          <fpage>44</fpage>
          -
          <lpage>46</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Forsyth</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponce</surname>
            ,
            <given-names>J.: Computer</given-names>
          </string-name>
          <string-name>
            <surname>Vision - A Modern Approach</surname>
          </string-name>
          , Second Edition. Pitman (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitinck</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louppe</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Scikit-learn: Machine learning without learning the machinery</article-title>
          .
          <source>GetMobile</source>
          <volume>19</volume>
          (
          <issue>1</issue>
          ) (
          <year>2015</year>
          )
          <fpage>29</fpage>
          -
          <lpage>33</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          , AAAI Press (
          <year>1996</year>
          )
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Comaniciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Mean shift: A robust approach toward feature space analysis</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>24</volume>
          (
          <issue>5</issue>
          ) (May
          <year>2002</year>
          )
          <fpage>603</fpage>
          -
          <lpage>619</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>