=Paper=
{{Paper
|id=Vol-1088/paper10
|storemode=property
|title=Ubiquitous Self-Organizing Maps
|pdfUrl=https://ceur-ws.org/Vol-1088/paper10.pdf
|volume=Vol-1088
|dblpUrl=https://dblp.org/rec/conf/ijcai/SilvaM13
}}
==Ubiquitous Self-Organizing Maps==
Ubiquitous Self-Organizing Maps
Bruno Silva Nuno Marques
DSI/ESTSetúbal DI/FCT - UNL
Instituto Politécnico de Setúbal Portugal
Portugal nmm@di.fct.unl.pt
bruno.silva@estsetubal.ips.pt
Knowledge discovery in ubiquitous environments are usu- among the input data, thus electing it as a data-mining tool of
ally conditioned by the data stream model, e.g., data is poten- choice [Vesanto, 1999], either for clustering, data inspection
tially infinite, arrives continuously and is subject to concept and/or classification. The powerful visualization techniques
drift. These factors present additional challenges to standard for SOM models allow the detection of complex cluster struc-
data mining algorithms. Artificial Neural Networks (ANN) tures, detection of non-linear relationships between features
models are still poorly explored in these settings. and even allow the clustering of time series.
State-of-the-art methods to deal with data streams are As with most standard data mining methods, classical
single-pass modifications of standard algorithms, e.g., K- SOM training algorithms are tailored to revisit the data sev-
means for clustering, and involve some relaxation of the qual- eral times to build good models. As training progresses, ex-
ity of the results, i.e., since the data cannot be revisited to re- isting learning parameters are decreased monotonically over
fine the models, the goal is to achieve good approximations time through one of a variety of decreasing functions. This
[Gama, 2010]. In [Guha et al., 2003] an improved single pass is required for the network to converge to a topological or-
k-means algorithm is proposed. However, k-means suffers dered state and to estimate the input space density. The con-
from the problem that the initial k clusters have to be set either sequence is that the maps lose plasticity over time, i.e., if a
randomly or through other methods. This has a strong impact training sample presented at a later point in time is very dif-
on the quality of the clustering process. CluStream [Aggar- ferent from what it has learned so far it does not have the
wal et al., 2003] is a framework that targets high-dimensional ability to represent this new data appropriately because these
data streams in a two-phased approach, where an online phase parameters do not allow large updates at that time. In ubiqui-
produces micro-clusterings of the incoming data, while pro- tous environments data is expected to be presented to the net-
ducing on-demand offline models of data also with k-means. work over time, i.e., the network should be learning gradually
In this position paper we address the use of Self- and derived knowledge should be accessible at any time. This
Organizing Maps (SOM) [Kohonen, 1982] and argue its means that the SOM must be able to retain an indefinite plas-
strengths over current methods and directions to be explored ticity over time, with the ability to incorporate very different
on its adaptation to ubiquitous environments, which involve data from what it has learned at a particular time, i.e., to be in
dynamic estimation of the learning parameters based on mea- conformance with the “dynamic environment” requirement.
suring concept drift on, usually, non-stationary underlying
distributions. In a previous work [Silva and Marques, 2012] Ubiquitous SOMs , i.e., self-organizing maps tailored for
we presented a neural network-based framework for data ubiquitous environments with streaming data, should define
stream mining that explored the two-phased methodology, those parameters not based on time t, but in the error on the
where the SOM produced offline models. In this paper we network for a particular observation.
advocate the development of a standalone Ubiquitous SOM An underused variant of the SOM, called the parameter-
(UbiSOM), that is capable of producing models in an online less SOM (PLSOM) [Berglund, 2010]), was first introduced
fashion, to be integrated in the framework. This allows de- to address the difficulty os estimating the initial learning pa-
rived knowledge to be accessible at any time. rameters. The PLSOM has only one parameter β (neighbor-
hood range) that needs to be specified in the beginning of the
The Self-Organizing Map is a well-established data- training process, after which α (learning rate) and σ (neigh-
mining algorithm with hundreds of applications throughout borhood radius) are estimated dynamically at each iteration.
enumerate scientific domains for tasks of classification, clus- The basic idea behind the PLSOM is that for an input pat-
tering and detection of non-linear relationships between fea- tern that the network already represents well, there is no need
tures [Oja et al., 2003]. It can be visualized as a sheet- for large adjustments – learning rate and neighborhood radius
like neural-network array, whose neurons become specifically are kept small. On the other hand, if an input vector is very
tuned to various input vectors (observations) in an orderly dissimilar of what was seen previously, then those parame-
fashion. The SOM is able to project high-dimensional data ters are adjusted to produce large adjustments. However, in
onto a 2D lattice, while preserving topological relationships its current form, it fails in mapping the input space density
Figure 1: Learned Gaussian distribution for the classical
SOM (left) and for the PLSOM (right). The later does not
maintain the density of the input space, which undermines Figure 2: a) Measuring concept drift over a stream of finan-
the use of visualization techniques for cluster detection and cial statistical indicators in [Silva et al., 2012]. b) The corre-
feature correlation. sponding time series of the Dow Jones Index.
onto the 2D lattice (Figure 1). This undermines the visual- different from what it has learned so far. A purely online
ization capabilities of the PLSOM, namely for cluster detec- Ubiquitous Self-Organizing Map (UbiSOM) that can learn
tion. Also, by estimating the values of the learning parame- non-stationary distributions is relevant for data stream min-
ters solely based on the network error for a given observation, ing, namely because of its c. The SOM mining capabilities
it is very sensible to outliers. greatly surpass K-means, without introducing a big overhead
Nevertheless, this variant of the SOM retains an indefinite in computation needs. Measuring concept drift as a way to
plasticity, which allows the SOM to react to very different in- estimate the learning parameters of the learning algorithm is,
put samples from what has been presented to it, at any point in our belief, a promising path.
in time; and converges faster to an initial global ordered state
of the lattice. These two capabilities makes PLSOM an inter- References
esting starting point for the proposed goal.
[Aggarwal et al., 2003] C.C. Aggarwal, J. Han, J. Wang, and
P.S. Yu. A framework for clustering evolving data streams.
Concept drift means that the concept about which data is In VLDB, pages 81–92, 2003.
being collected may shift from time to time, each time after
some minimum permanence. Changes occur over time. The [Berglund, 2010] E. Berglund. Improved PLSOM algorithm.
evidence of drift in a concept is reflected in the training sam- Applied Intelligence, 32(1):122–130, 2010.
ples (e.g., change of mean, variance and/or correlation). Old [Gama, 2010] João Gama. Knowledge discovery from data
observations, which reflect the behavior in nature in the past, streams. Chapman & Hall/CRC Data Mining and Knowl-
become irrelevant to the current state of the phenomena un- edge Discovery Series, 2010.
der observation [Gama, 2010]. In [Silva et al., 2012] we ad- [Guha et al., 2003] S. Guha, A. Meyerson, N. Mishra,
dressed concept drift detection using a different type of neu- R. Motwani, and L. O’Callaghan. Clustering data streams:
ral network, namely Adaptive Resonance Theory (ART) net- Theory and practice. IEEE Transactions on Knowledge
works. Figure 2 illustrates its applicability to financial time and Data Engineering, pages 515–528, 2003.
series. It works by measuring the quantization error of the last
built micro-cluster ART model, over a predefined number of [Kohonen, 1982] T. Kohonen. Self-organized formation of
previous ones. We propose to use the ideas of the PLSOM topologically correct feature maps. Biological cybernetics,
algorithm using the network error as an input to a concept 43(1):59–69, 1982.
drift module, either ANN-based or not. While the concept is [Oja et al., 2003] Merja Oja, Samuel Kaski, and Teuvo Ko-
stable, the learning parameters are being decreased monoton- honen. Bibliography of self-organizing map (som) papers:
ically so as to map the input space density; when the concept 1998-2001, 2003.
begins to drift the parameters are adjusted to higher values so [Silva and Marques, 2012] Bruno Silva and Nuno Marques.
as to cope with the different observations. If the, possibly, un-
Neural network-based framework for data stream mining.
derlying non-stationary distribution is drifting rapidly, main-
In Proceedings of the Sixth Starting AI Researchers’ Sym-
taining higher learning parameters will, consequently, make
posium. IOS Press, 2012.
the model “forget” old and irrelevant observations to the cur-
rent state. [Silva et al., 2012] Bruno Silva, Nuno Marques, and Gisele
Panosso. Applying neural networks for concept drift de-
tection in financial markets. In ECAI2012, Ubiquitous
Conclusion ANN methods exhibit some advantages in Data Mining Workshop, 2012.
ubiquitous data mining: they have the ability to adapt to
changing environments; have the ability to generalize from [Vesanto, 1999] J. Vesanto. SOM-based data visualization
what they have learned; and through the ANN error it is pos- methods. Intelligent-Data-Analysis, 3:111–26, 1999.
sible to determine if the new information that arrives is very