<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ubiquitous Self-Organizing Maps</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bruno Silva</string-name>
          <email>bruno.silva@estsetubal.ips.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nuno Marques</string-name>
          <email>nmm@di.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DI/FCT - UNL</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DSI/ESTSet u ́bal, Instituto Polite ́cnico de Set u ́bal</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Knowledge discovery in ubiquitous environments are
usually conditioned by the data stream model, e.g., data is
potentially infinite, arrives continuously and is subject to concept
drift. These factors present additional challenges to standard
data mining algorithms. Artificial Neural Networks (ANN)
models are still poorly explored in these settings.</p>
      <p>State-of-the-art methods to deal with data streams are
single-pass modifications of standard algorithms, e.g.,
Kmeans for clustering, and involve some relaxation of the
quality of the results, i.e., since the data cannot be revisited to
refine the models, the goal is to achieve good approximations
[Gama, 2010]. In [Guha et al., 2003] an improved single pass
k-means algorithm is proposed. However, k-means suffers
from the problem that the initial k clusters have to be set either
randomly or through other methods. This has a strong impact
on the quality of the clustering process. CluStream
[Aggarwal et al., 2003] is a framework that targets high-dimensional
data streams in a two-phased approach, where an online phase
produces micro-clusterings of the incoming data, while
producing on-demand offline models of data also with k-means.</p>
      <p>In this position paper we address the use of
SelfOrganizing Maps (SOM) [Kohonen, 1982] and argue its
strengths over current methods and directions to be explored
on its adaptation to ubiquitous environments, which involve
dynamic estimation of the learning parameters based on
measuring concept drift on, usually, non-stationary underlying
distributions. In a previous work [Silva and Marques, 2012]
we presented a neural network-based framework for data
stream mining that explored the two-phased methodology,
where the SOM produced offline models. In this paper we
advocate the development of a standalone Ubiquitous SOM
(UbiSOM), that is capable of producing models in an online
fashion, to be integrated in the framework. This allows
derived knowledge to be accessible at any time.</p>
      <p>The Self-Organizing Map is a well-established
datamining algorithm with hundreds of applications throughout
enumerate scientific domains for tasks of classification,
clustering and detection of non-linear relationships between
features [Oja et al., 2003]. It can be visualized as a
sheetlike neural-network array, whose neurons become specifically
tuned to various input vectors (observations) in an orderly
fashion. The SOM is able to project high-dimensional data
onto a 2D lattice, while preserving topological relationships
among the input data, thus electing it as a data-mining tool of
choice [Vesanto, 1999], either for clustering, data inspection
and/or classification. The powerful visualization techniques
for SOM models allow the detection of complex cluster
structures, detection of non-linear relationships between features
and even allow the clustering of time series.</p>
      <p>As with most standard data mining methods, classical
SOM training algorithms are tailored to revisit the data
several times to build good models. As training progresses,
existing learning parameters are decreased monotonically over
time through one of a variety of decreasing functions. This
is required for the network to converge to a topological
ordered state and to estimate the input space density. The
consequence is that the maps lose plasticity over time, i.e., if a
training sample presented at a later point in time is very
different from what it has learned so far it does not have the
ability to represent this new data appropriately because these
parameters do not allow large updates at that time. In
ubiquitous environments data is expected to be presented to the
network over time, i.e., the network should be learning gradually
and derived knowledge should be accessible at any time. This
means that the SOM must be able to retain an indefinite
plasticity over time, with the ability to incorporate very different
data from what it has learned at a particular time, i.e., to be in
conformance with the “dynamic environment” requirement.
Ubiquitous SOMs , i.e., self-organizing maps tailored for
ubiquitous environments with streaming data, should define
those parameters not based on time t, but in the error on the
network for a particular observation.</p>
      <p>An underused variant of the SOM, called the
parameterless SOM (PLSOM) [Berglund, 2010]), was first introduced
to address the difficulty os estimating the initial learning
parameters. The PLSOM has only one parameter
(neighborhood range) that needs to be specified in the beginning of the
training process, after which (learning rate) and
(neighborhood radius) are estimated dynamically at each iteration.
The basic idea behind the PLSOM is that for an input
pattern that the network already represents well, there is no need
for large adjustments – learning rate and neighborhood radius
are kept small. On the other hand, if an input vector is very
dissimilar of what was seen previously, then those
parameters are adjusted to produce large adjustments. However, in
its current form, it fails in mapping the input space density
onto the 2D lattice (Figure 1). This undermines the
visualization capabilities of the PLSOM, namely for cluster
detection. Also, by estimating the values of the learning
parameters solely based on the network error for a given observation,
it is very sensible to outliers.</p>
      <p>Nevertheless, this variant of the SOM retains an indefinite
plasticity, which allows the SOM to react to very different
input samples from what has been presented to it, at any point
in time; and converges faster to an initial global ordered state
of the lattice. These two capabilities makes PLSOM an
interesting starting point for the proposed goal.</p>
      <p>Concept drift means that the concept about which data is
being collected may shift from time to time, each time after
some minimum permanence. Changes occur over time. The
evidence of drift in a concept is reflected in the training
samples (e.g., change of mean, variance and/or correlation). Old
observations, which reflect the behavior in nature in the past,
become irrelevant to the current state of the phenomena
under observation [Gama, 2010]. In [Silva et al., 2012] we
addressed concept drift detection using a different type of
neural network, namely Adaptive Resonance Theory (ART)
networks. Figure 2 illustrates its applicability to financial time
series. It works by measuring the quantization error of the last
built micro-cluster ART model, over a predefined number of
previous ones. We propose to use the ideas of the PLSOM
algorithm using the network error as an input to a concept
drift module, either ANN-based or not. While the concept is
stable, the learning parameters are being decreased
monotonically so as to map the input space density; when the concept
begins to drift the parameters are adjusted to higher values so
as to cope with the different observations. If the, possibly,
underlying non-stationary distribution is drifting rapidly,
maintaining higher learning parameters will, consequently, make
the model “forget” old and irrelevant observations to the
current state.</p>
      <p>Conclusion ANN methods exhibit some advantages in
ubiquitous data mining: they have the ability to adapt to
changing environments; have the ability to generalize from
what they have learned; and through the ANN error it is
possible to determine if the new information that arrives is very
different from what it has learned so far. A purely online
Ubiquitous Self-Organizing Map (UbiSOM) that can learn
non-stationary distributions is relevant for data stream
mining, namely because of its c. The SOM mining capabilities
greatly surpass K-means, without introducing a big overhead
in computation needs. Measuring concept drift as a way to
estimate the learning parameters of the learning algorithm is,
in our belief, a promising path.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Aggarwal et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>C.C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.S.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>A framework for clustering evolving data streams</article-title>
          .
          <source>In VLDB</source>
          , pages
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Berglund</source>
          , 2010]
          <string-name>
            <given-names>E.</given-names>
            <surname>Berglund</surname>
          </string-name>
          .
          <source>Improved PLSOM algorithm. Applied Intelligence</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>122</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Gama</source>
          , 2010]
          <article-title>Joa˜o Gama. Knowledge discovery from data streams</article-title>
          .
          <source>Chapman &amp; Hall/CRC Data Mining and Knowledge Discovery Series</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Guha et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meyerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>O'Callaghan</surname>
          </string-name>
          .
          <article-title>Clustering data streams: Theory and practice</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , pages
          <fpage>515</fpage>
          -
          <lpage>528</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Kohonen</source>
          , 1982]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen</surname>
          </string-name>
          .
          <article-title>Self-organized formation of topologically correct feature maps</article-title>
          .
          <source>Biological cybernetics</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Oja et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>Merja</given-names>
            <surname>Oja</surname>
          </string-name>
          , Samuel Kaski, and
          <string-name>
            <given-names>Teuvo</given-names>
            <surname>Kohonen</surname>
          </string-name>
          .
          <article-title>Bibliography of self-organizing map (som</article-title>
          ) papers:
          <fpage>1998</fpage>
          -
          <lpage>2001</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Silva and Marques</source>
          , 2012]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Silva</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nuno</given-names>
            <surname>Marques</surname>
          </string-name>
          .
          <article-title>Neural network-based framework for data stream mining</article-title>
          .
          <source>In Proceedings of the Sixth Starting AI Researchers' Symposium</source>
          . IOS Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Silva et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Silva</surname>
          </string-name>
          , Nuno Marques, and
          <string-name>
            <given-names>Gisele</given-names>
            <surname>Panosso</surname>
          </string-name>
          .
          <article-title>Applying neural networks for concept drift detection in financial markets</article-title>
          .
          <source>In ECAI2012, Ubiquitous Data Mining Workshop</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Vesanto</source>
          , 1999]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vesanto</surname>
          </string-name>
          .
          <article-title>SOM-based data visualization methods</article-title>
          .
          <source>Intelligent-Data-Analysis</source>
          ,
          <volume>3</volume>
          :
          <fpage>111</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>