<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A study of SOM clustering software implementations</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">A</forename><forename type="middle">B</forename><surname>Adeyemo</surname></persName>
							<email>sesanadeyemo@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">University of Ibadan Nigeria</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">A study of SOM clustering software implementations</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">A5A115B500F002FCC02E819ADFFD1D5B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Comparative Analysis</term>
					<term>Clustering</term>
					<term>Self Organizing Maps</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Clustering algorithms generally suffer from some well-known problems for which the Self Organizing Maps (SOM) algorithms are adept at handling. While there are many variants of the SOM algorithm, software programmes that implement the SOM algorithms have tended to give varying results even when tested on the same data sets. This can have serious implications when the goal of the clustering is novelty detection. In this study a comparison of the performance of some SOM clustering software was carried out and results presented.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CCS Concepts</head><p>• General and reference ➝ -computing tools and techniques ➝ Empirical studies</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>In the clustering process data is grouped in such a way that the intra-cluster similarity is maximized while the inter-cluster similarity is minimized. Data can be described by either categorical or numeric features. Due to the differences in the characteristics of these two kinds of data, attempts to develop criteria functions for mixed data have not been very successful <ref type="bibr" target="#b14">[15]</ref>. There are two widely used clustering methods: the hierarchical and the nonhierarchical (partitional) methods. The hierarchical clustering process can be categorized as divisive when a large data set is divided into several small groups and, agglomerative when a small data set are put together to create a larger cluster. Self-Organizing Maps (SOM) are competitive networks that provide a "topological" mapping from the input space to the clusters <ref type="bibr" target="#b4">[4]</ref>. The SOM was inspired by the way in which various human sensory impressions are neurologically mapped into the brain such that spatial or other relations among stimuli correspond to spatial relations among the neurons.</p><p>In a SOM, the neurons (clusters) are organized into a grid which is usually two-dimensional, but sometimes one-dimensional (or (rarely) three or more-dimensions. The reason for using one-and two dimensional grids is that space structures of higher dimensionality cause problems with data display and cannot be displayed on the monitor. The SOM working algorithm is a variant of multidimensional vectors clustering of which the Kmeans clustering algorithm is an example of this type of algorithm <ref type="bibr">[9]</ref>.</p><p>The SOM neural network uses a competitive learning algorithm and is a method for unsupervised learning, based on a grid of artificial neurons whose weights are adapted to match input vectors in a training set. The SOM algorithm is fed with feature vectors, which can be of any dimension. The algorithm for the training of the SOM <ref type="bibr" target="#b4">[4]</ref> is explained easily in terms of a set of artificial neurons, each having its own physical location on the output map, which take part in a winner-takeall process where a node with its weight vector closest to the vector of inputs is declared the winner and its weights are adjusted making them closer to the input vector. In each training step, one sample vector "x" from the input data set is chosen randomly and a similarity measure is calculated between it and all the weight vectors of the map. The Best-Matching Unit (BMU), denoted as "c", is the unit whose weight vector has the greatest similarity with the input sample "x" (figure <ref type="figure" target="#fig_0">1</ref>). The similarity is usually defined by means of a distance measure, usually the Euclidian distance. The BMU is defined mathematically as the processing element for which the expression: . …..…………….….. 1 where d is the distance measure.</p><p>Each node has a set of neighbors. When a node wins a competition, the neighbor"s weights are also changed but not as much as that of the winning node. The further the neighbor is from the winner, the smaller its weight change. The SOM update rule for the weight vector of the unit i is given mathematically as: ………………2 where t represents the sample index for each presentation of a sample "x" h c(x),i represents the neighborhood function around the winner unit "c", with neighborhood radius r(t).</p><p>The neighborhood function is like a smoothing kernel that is time-variable. It is a decreasing function of the distance between the the ith and cth reference vectors on the map grid. The neighborhood function is usually expressed as the Gaussian function which can be expressed mathematically as: …………………3 where ά(t) represents the learning rate factor and takes values 0&lt; ά(t)&lt;1 σ(t) represents the width of the neighborhood function which decreases monotomically with the regression steps.</p><p>A simpler definition of the neighbourhood function given by Kohonen <ref type="bibr" target="#b4">[4]</ref> is:</p><formula xml:id="formula_0">h c(x),I =σ(t)…………………………………………………….4</formula><p>If ║ri -rc║ is smaller than a given radius around node "c" and the radius is also a monotomically decreasing function of the regression steps, but otherwise hc(x),I = 0. σ(t) is a diminishing function of time. At the beginning of the learning procedure it is fairly large, but it is made to gradually shrink during learning. Towards the end of learning a single winning processing element is trained. A linear diminishing function of time is usually used. The learning process consisting of winner selection by Equation (1) and adaptation of the synaptic weights by Equation (2). This process is repeated for each input vector, usually for a large number of cycles with different inputs producing different winners. The network therefore associates output nodes with groups or patterns in the input data set. The SOM algorithm is very simple and allows for many subtle adaptations.</p><p>There are some visual displays that are used to "determine" where the natural cluster boundaries are in the SOM. Some of the visual tools that can be used are Histograms <ref type="bibr" target="#b5">[6]</ref>, Component Plane displays [3], U-matrix, P-matrix and U* matrix displays [10], <ref type="bibr">[11]</ref>, <ref type="bibr">[12, [13]</ref>. An important concept in interpreting these displays is the interaction of the two properties of the SOM. These are the neighborhood relationship, and the density mapping. Neighboring neurons in the SOM cannot be too far away from each other (in order to maintain their similarity) but the SOM also wants to place more neurons in areas of high input density (for example, logical clusters). Because of this, there will be neurons that will be placed in areas between natural clusters which are typically low input density areas (so that the map can "stretch" between clusters).</p><p>The standard SOM algorithm uses numeric type variables and the Euclidean distance function. The arithmetic operations used during the learning phase for the update of the feature vectors cannot be used with categorical values. The SOM was not directly designed to work with categorical variables due to the limitation of learning laws. The method usually adopted is to translate categories to numeric numbers during data pre-processing before training using the transformed data using standard SOM algorithm <ref type="bibr">[2]</ref>. The Kohonen SOM clustering algorithm has also been used for classification purposes with remarkable results. There is a fundamental difference between the clustering process and the classification process. Clustering is an unsupervised process while classification is supervised. Usually data clustering is used as a pre-processor for classification purposes <ref type="bibr">[8]</ref>.</p><p>A rich variety of versions of the basic SOM algorithm have been proposed. Some of the variants aim at improving the preservation of topology by using more flexible map structures instead of the fixed grid. Some of these methods however cannot be used for visualization as easily as the regular grid. Some variants aim at reducing the computational complexity of the SOM <ref type="bibr">[3]</ref>. Experiments using different distance measures, map topologies, training parameters such as the learning rate and neighbourhood function can be carried out.</p><p>Using identical settings, training of a SOM map over different iterations can lead to different mappings, because of the random initialisation. Yet it has been shown that the conclusions drawn from the map remain remarkably consistent, which makes it a very useful tool in many different circumstances <ref type="bibr">[14]</ref>. Some of the desirable features that good SOM clustering software should have include:</p><p>1. Being able to set the neighborhood kernel function and to set the start value for the neighborhood function (learning radius): The neighborhood function determines how strongly the processing elements are connected to each other. Neighborhoods of different sizes in different neuron configurations (e.g. rectangular and hexagonal lattices) can be used. The simplest neighborhood function is the bubble (winnertakes-all): it is constant (or 1) over the whole neighborhood of the winner unit and zero elsewhere.</p><p>Usually the neighbourhood function is expressed as a Gaussian function and as expected using the winnertakes-all function retrieves less clusters than the Gaussian function.</p><p>2. Being able to set the activation function and weight initialization methods: Before the training, initial values are given to the prototype vectors of the SOM. The SOM is very robust with respect to the initialization process, however, when properly accomplished it allows the algorithm to converge faster to a good solution. Initialization procedures that have been used are: Random initialization, where the weight vectors are initialized with small random values; Sample initialization, where the weight vectors are initialized with random samples drawn from the input data set; Linear initialization, where the weight vectors are initialized in an orderly fashion along the linear subspace spanned by the two principal eigenvectors of the input data set.</p><p>3. Being able to set the choice of cooling strategy during training: for example linear or exponential.</p><p>4. Being able to set the distance measure to be used, for example, Euclidean, Manhattan and Maximum value: It is noted that the distance measure between data points is an important component of a clustering algorithm. If the components of the data instance vectors are all in the same physical units then it is possible to use the simple Euclidean distance metric to successfully group similar data elements. The Euclidean distance in a two or three-dimensional space measures is the actual geometric distance between objects in the space. However, it has been observed that even the Euclidean distance can sometimes be misleading, because of the way the mathematical formula used to combine the distances between the single components of the data feature vectors into a unique distance measure that can be used for clustering purposes is computed. Different formulas lead to different clustering"s. Therefore, domain knowledge must be used to guide the formulation of a suitable distance measure for each particular application.</p><p>5. Being able to set the scaling technique to be used: for example z-transform, (0,1) transform, (1,-1) transform or none, depending on the clustering goal and data set.</p><p>6. Being able to set the starting and stopping learning rate:</p><p>The learning rate is a decreasing function of time between [0,1]. The learning rate can be expressed as a linear function and as a function inversely proportional to time. Using the inverse function ensures that all input samples have approximately equal influence on the training result. Some learning rate functions that have been implemented are the linear, inverse-of-time, and as a power ser.</p><p>7. Being able to set the training algorithm to be used: for example batch, on-line, hybrid etc. The batch algorithm has been shown to be faster <ref type="bibr" target="#b4">[4]</ref> than the normal sequential algorithm (and the results are just as good or even better).</p><p>8. Good data visualization options: for example histograms, hinton charts, weight charts (maps), U-Matrix, P-Matrix etc. Good result analysis and presentation functions: computation of vital statistics for evaluating the quality of the clustering for example, mean, standard deviation (or variance), correlation coefficient, t-test etc.</p><p>This work presents a comparative study of the performance some SOM clustering software when tested on the same data set. Results were presented and reasons for the observed variations presented. The study also presents the desirable features that standard SOM software should have. Using the three software"s clusters was generated. The arithmetic mean of each cluster group was also computed. The arithmetic mean is a measure of central tendency which describes the central location of data and is usually used with other statistical measures such as the standard deviation because it can be affected by extreme values in the data set and therefore be biased. The standard deviation describes the spread of the data and is a popular measure of dispersion. It measures the average distance between a single observation and its mean.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">MATERIALS AND METHODS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">RESULTS AND DISCUSSION</head><p>The meteorological data was clustered using NNClust SOM clustering software with a starting learning rate of 0.9 and was trained over 100 epochs. The software accepts only numeric values. Non numeric values are treated as missing values which are replaced by the column mean. The software was set to identify a maximum of ten clusters, however only eight clusters were generated. The software uses the number of clusters specified to create the SOM grid. The mean and standard deviation of the eight clusters were computed.</p><p>Increasing the training cycle did not improve the results. Table <ref type="table" target="#tab_2">1</ref> presents the summary of the eight clusters, while figure <ref type="figure" target="#fig_1">2</ref> presents the chart of the cluster means. The meteorological data was trained using the Pittnet software with a starting learning rate of 0.9 and was set to train for 100 epochs, although the software stops training as soon as the maximum number of clusters have been generated. The software requires the user to specify the number of clusters expected apriori. This number is used in conjunction with the number of input signals (attributes) to determine the SOM grid size. Expected number of clusters was set to ten. The software identified only four clusters. The mean and standard deviation of the clusters were computed. Table <ref type="table" target="#tab_3">2</ref> presents the summary of the clusters, while figure <ref type="figure">3</ref> presents the chart of the cluster means.</p><p>TheRapidMiner Studio software was used to cluster the meteorological data set using a starting learning rate of 0.9 and was trained over a 100 epochs. The expected number of clusters was set at ten and the software generated ten clusters. Table <ref type="table" target="#tab_4">3</ref> presents the summary of the cluster means with their standard deviations while figure <ref type="figure">4</ref> presents a chart of their cluster means.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Discussion of Results</head><p>The quality of the clusters identified in the data by the three software"s can be inferred from a comparison of the mean and standard deviation of the clusters. If the value of the standard deviation is low, then the clustered records are within the same range. However if the value is high this suggests the presence of outliers in the clustered data records. For example table <ref type="table" target="#tab_5">4</ref> presents the clustered records for cluster 2 (table1) for the NNClust software which is representative of the trend observed in the clusters identified by the software. Interpreting the cluster is indecisive when the values in the Total Rainfall fields are considered. The field has a mean of 142.05 and a standard deviation of 136.011711. Similarly considering the clusters identified by the Pittnet software in table 2 the same trend is observed. Table <ref type="table" target="#tab_6">5</ref> presents the records for cluster 4 (table <ref type="table" target="#tab_3">2</ref>) for the Pittnet software cluster results. It can be observed that the cluster is consists of data records which have the same value for the FireDangerIndex attribute. However, considering the Total Rainfall field which has a mean value of 39.74444 and a standard deviation of 43.34732. The high standard deviation value implies that there are outlier data values in the clustered records.</p><p>The clusters identified by the RapidMiner software presented in table <ref type="table" target="#tab_4">3</ref> were easier to interpret. They followed the expected rainfall pattern which is known for the region where the data was collected <ref type="bibr" target="#b0">[5]</ref>. Cluster 2 (  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">ACKNOWLEDGMENTS</head><p>Some of the problems found in the literature about clustering algorithms are: Most clustering techniques are based on distance calculations which are very sensitive to ranges of variables, therefore the values have to be normalized. Normalization however is a subjective function, and these transformations cannot be carried out without creating biases;</p><p>The presence of outliers in data sets create problems in data clustering based on distance calculations when they have not been identified and removed from the data set; Handling categorical variables (non-numeric data, non-numeric variables, categorical data, nominal data, or nominal variables) are a problem for most clustering algorithms, and even when data encoding methods are used they can introduce extra biases due to the number of values which the encoding introduces in the categorical variables; The selection of variables also has a large influence on clustering results, while the assigning of different weights for variables and categorical values can be used, when many variables and categorical values are involved, it can affect the clustering quality; Capturing patterns (or behaviors) hidden inside time-varying variables and modeling them is another problem and most clustering techniques do not possess this predictive modeling capability; Most clustering techniques were developed for laboratory generated simple data sets consisting of a few to several numerical variables; hence they can"t be used for large data analyses that consist of many categorical complex data.</p><p>Most common implementation of data clustering algorithms suffer from these problems, however, SOM"s are very robust and are adept at handling these problems but this depends also on the goal of the algorithm"s implementation (programming).</p><p>Applications programmed for demonstration purposes cannot be used for large scale projects and some implementations are not flexible and do not give users much options. However if the various implementations of the conventional SOM algorithm (which are usually focused on the goals of the programmer) provides enough options to the user, it is still a very robust algorithm that can be used for both numerical, categorical and mixed data sets. Further work in this study is focused on the development of an open flexible SOM clustering tool with adequate features that can be used for research purposes. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Illustration of the updating of the Best Matching Unit (BMU) of a SOM grid and its neighbors</figDesc><graphic coords="1,317.85,391.90,249.59,210.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Chart of NNClust cluster means</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :Figure 4 :</head><label>34</label><figDesc>Figure 3: Chart of Pitnett software cluster means</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Agro metrological data for FRIN headquarters, Ibadan, Nigeria was used. The data set had 254 records and the attributes in the data set were: Year (numeric), Month (text), Total Rainfall in millimeters (numeric), Minimum Temperature in Celsius (numeric), Maximum Temperature in Celsius (numeric), Relative Humidity and Fire Danger Index (numeric). The SOM software used were: NNClust, Pittnet Neural Network Educational Software and RapidMiner Studio.The NNClust software was programmed to use only the Gaussian neighbourhood function and the Euclidean distance measure. The user can input the learning rate and starting neighbourhood size. The software automatically normalizes the input data between -1 and 1 and has features for generating data/result statistics and data visualization such as weight maps and radar charts. The Pittnet software also uses the Gaussian neighbourhood function and Euclidean distance metrics. The user also defines the starting learning rate and it also automatically normalizes the data between 0 and 1. It is a DOS based program that saves its result in a text file and has no data analysis or data visualization ability. RapidMiner studio (Community Edition) has facilities for selecting parameters for defining the learning rate, neighbourhood radius and can choose either to normalize the data or not. It also has an array of tools for statistical data analysis and data visualization.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>table 3) contained records with only a high FireDangerIndex of 4 as presented in table 6, while cluster 5 (table 3) contains records with the highest recorded Rainfall level in the data set. The other clusters also contained data records which can be categorized by the Rainfall level pattern of the region.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1 : Summary of NNClust clusters</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell>TotalRainfall</cell><cell>MaxTemp</cell><cell>MinTemp</cell><cell>RH</cell><cell>FireDangerIndex</cell></row><row><cell>Cluster 1</cell><cell>Mean</cell><cell>3.7</cell><cell>32</cell><cell>24</cell><cell>83</cell><cell>2</cell></row><row><cell></cell><cell>SD</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>0</cell></row><row><cell>Cluster 2</cell><cell>Mean</cell><cell>142.05</cell><cell>33.5</cell><cell>24.5</cell><cell>79.33333</cell><cell>2.666666667</cell></row><row><cell></cell><cell>SD</cell><cell>2.61629509</cell><cell>22.627417</cell><cell>16.9706</cell><cell>4.501851</cell><cell>0.516397779</cell></row><row><cell>Cluster 3</cell><cell>Mean</cell><cell>113.313158</cell><cell>31.1236842</cell><cell>31.0605</cell><cell>70.54737</cell><cell>2.5</cell></row><row><cell></cell><cell>SD</cell><cell>69.9895185</cell><cell>15.4557389</cell><cell>11.4404</cell><cell>45.62364</cell><cell>1.246560403</cell></row><row><cell>Cluster 4</cell><cell>Mean</cell><cell>149.99</cell><cell>30.8333333</cell><cell>30.2967</cell><cell>73.75333</cell><cell>2.333333333</cell></row><row><cell></cell><cell>SD</cell><cell>98.1425436</cell><cell>3.53058883</cell><cell>20.0499</cell><cell>25.41582</cell><cell>0.546672274</cell></row><row><cell>Cluster 5</cell><cell>Mean</cell><cell>109.891667</cell><cell>30.6333333</cell><cell>36.1667</cell><cell>64.64444</cell><cell>2.638888889</cell></row><row><cell></cell><cell>SD</cell><cell>92.1210985</cell><cell>4.02073199</cell><cell>24.3938</cell><cell>34.37646</cell><cell>0.723198364</cell></row><row><cell>Cluster 6</cell><cell>Mean</cell><cell>141.621277</cell><cell>31.7574468</cell><cell>27.0617</cell><cell>73.1617</cell><cell>2.617021277</cell></row><row><cell></cell><cell>SD</cell><cell>97.0359995</cell><cell>2.63056819</cell><cell>13.7078</cell><cell>20.8623</cell><cell>0.644481304</cell></row><row><cell>Cluster 7</cell><cell>Mean</cell><cell>123.545794</cell><cell>31.4411215</cell><cell>29.4963</cell><cell>74.41028</cell><cell>2.411214953</cell></row><row><cell></cell><cell>SD</cell><cell>81.8137003</cell><cell>2.96536463</cell><cell>18.4077</cell><cell>24.4239</cell><cell>0.531165877</cell></row><row><cell>Cluster 8</cell><cell>Mean</cell><cell>175.268966</cell><cell>29.3793103</cell><cell>23.069</cell><cell>86.89655</cell><cell>2.068965517</cell></row><row><cell></cell><cell>SD</cell><cell>85.4901878</cell><cell>1.49794605</cell><cell>1.06674</cell><cell>4.312315</cell><cell>0.257880715</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 : Summary of the Pitnett software clusters</head><label>2</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell>TotalRainfall</cell><cell>MaxTemp</cell><cell>MinTemp</cell><cell>RH</cell><cell>FireDangerIndex</cell></row><row><cell>Cluster 1</cell><cell>Mean</cell><cell>50.850001</cell><cell>24.75</cell><cell>63.5</cell><cell>3.9</cell><cell>4</cell></row><row><cell></cell><cell>SD</cell><cell>31.32483</cell><cell>0.070709</cell><cell>12.0208153</cell><cell>0.141421356</cell><cell>0</cell></row><row><cell>Cluster 2</cell><cell>Mean</cell><cell>134.3332</cell><cell>31.7082</cell><cell>23.5984375</cell><cell>82.4218728</cell><cell>2.3828125</cell></row><row><cell></cell><cell>SD</cell><cell>91.137324</cell><cell>2.254123</cell><cell>1.06439596</cell><cell>6.908488013</cell><cell>0.487025284</cell></row><row><cell>Cluster 3</cell><cell>Mean</cell><cell>138.05185</cell><cell>24.64815</cell><cell>84.4074074</cell><cell>2.196296296</cell><cell>2.407407407</cell></row><row><cell></cell><cell>SD</cell><cell>45.668999</cell><cell>15.90804</cell><cell>27.2370968</cell><cell>39.48311832</cell><cell>1.836329785</cell></row><row><cell>Cluster 4</cell><cell>Mean</cell><cell>39.744444</cell><cell>35.55556</cell><cell>23.5555556</cell><cell>59.22222133</cell><cell>4</cell></row><row><cell></cell><cell>SD</cell><cell>43.347321</cell><cell>1.333333</cell><cell>1.74005108</cell><cell>7.120003363</cell><cell>0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3 : Summary of Rapid miner Studio clusters</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell>TotalRainfall</cell><cell>MaxTemp</cell><cell>MinTemp</cell><cell>RH</cell><cell>FireDangerIndex</cell></row><row><cell>cluster 0</cell><cell>Mean</cell><cell>42.35385</cell><cell>33.41154</cell><cell>23.99615</cell><cell>78.46153846</cell><cell>2.730769231</cell></row><row><cell></cell><cell>SD</cell><cell>8.192056</cell><cell>2.308823</cell><cell>0.911913</cell><cell>7.798619207</cell><cell>0.603833905</cell></row><row><cell>cluster 1</cell><cell>Mean</cell><cell>13.50513</cell><cell>33.47179</cell><cell>23.80769</cell><cell>77.43589744</cell><cell>2.820512821</cell></row><row><cell></cell><cell>SD</cell><cell>9.379343</cell><cell>2.342845</cell><cell>1.280909</cell><cell>6.302860135</cell><cell>0.451418517</cell></row><row><cell>cluster 2</cell><cell>Mean</cell><cell>7.64</cell><cell>35.36</cell><cell>23.42</cell><cell>55.2</cell><cell>3.8</cell></row><row><cell></cell><cell>SD</cell><cell>16.15873</cell><cell>17.96476</cell><cell>13.16786</cell><cell>40.93966268</cell><cell>1.299899072</cell></row><row><cell>cluster 3</cell><cell>Mean</cell><cell>57.94667</cell><cell>25.35333</cell><cell>78.13333</cell><cell>2.726666667</cell><cell>2.933333333</cell></row><row><cell></cell><cell>SD</cell><cell>13.23034</cell><cell>15.63488</cell><cell>11.11308</cell><cell>32.15964741</cell><cell>1.361648053</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4 : Sample NNClust software cluster result</head><label>4</label><figDesc></figDesc><table><row><cell>Year</cell><cell>Months</cell><cell cols="2">TotalRainfall MaxTemp</cell><cell>MinTemp</cell><cell>RH</cell><cell>FireDangerIndex</cell></row><row><cell>1980</cell><cell>Feb.</cell><cell>60</cell><cell>35</cell><cell>27</cell><cell>75</cell><cell>3</cell></row><row><cell>1987</cell><cell>Aug.</cell><cell>357.1</cell><cell>30</cell><cell>23</cell><cell>86</cell><cell>2</cell></row><row><cell>1987</cell><cell>Nov.</cell><cell>10</cell><cell>35</cell><cell>24</cell><cell>80</cell><cell>3</cell></row><row><cell>1989</cell><cell>Mar.</cell><cell>57</cell><cell>35</cell><cell>25</cell><cell>77</cell><cell>3</cell></row><row><cell>1991</cell><cell>Apr.</cell><cell>108.9</cell><cell>32</cell><cell>24</cell><cell>83</cell><cell>2</cell></row><row><cell>1998</cell><cell>Sept.</cell><cell>259.3</cell><cell>34</cell><cell>24</cell><cell>75</cell><cell>3</cell></row><row><cell>Mean</cell><cell></cell><cell>142.05</cell><cell>33.5</cell><cell>24.5</cell><cell>79.33333</cell><cell>2.666667</cell></row><row><cell>SD</cell><cell></cell><cell>136.0117</cell><cell>2.073644</cell><cell>1.378405</cell><cell>4.501851</cell><cell>0.516398</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 5 : Sample Pittnet software cluster result</head><label>5</label><figDesc></figDesc><table><row><cell>Year</cell><cell>Months</cell><cell cols="2">TotalRainfall MaxTemp</cell><cell>MinTemp</cell><cell>RH</cell><cell>FireDangerIndex</cell></row><row><cell>1989</cell><cell>Feb.</cell><cell>18.4</cell><cell>35</cell><cell>22</cell><cell>51</cell><cell>4</cell></row><row><cell>1990</cell><cell>Feb.</cell><cell>40.3</cell><cell>35</cell><cell>23</cell><cell>64</cell><cell>4</cell></row><row><cell>1990</cell><cell>Mar.</cell><cell>11.7</cell><cell>37</cell><cell>25</cell><cell>69</cell><cell>4</cell></row><row><cell>1994</cell><cell>Jan.</cell><cell>1.3</cell><cell>33</cell><cell>20</cell><cell>45</cell><cell>4</cell></row><row><cell>1997</cell><cell>Mar.</cell><cell>122.2</cell><cell>35</cell><cell>23</cell><cell>62</cell><cell>4</cell></row><row><cell>1998</cell><cell>Feb.</cell><cell>2</cell><cell>36</cell><cell>25</cell><cell>60</cell><cell>4</cell></row><row><cell>2000</cell><cell>Mar.</cell><cell>48.8</cell><cell>37</cell><cell>25</cell><cell>62</cell><cell>4</cell></row><row><cell>2001</cell><cell>Mar.</cell><cell>15</cell><cell>37</cell><cell>25</cell><cell>60</cell><cell>4</cell></row><row><cell>2001</cell><cell>Apr.</cell><cell>98</cell><cell>35</cell><cell>24</cell><cell>60</cell><cell>4</cell></row><row><cell>Mean</cell><cell></cell><cell>39.74444</cell><cell>35.55556</cell><cell>23.55556</cell><cell>59.22222</cell><cell>4</cell></row><row><cell>SD</cell><cell></cell><cell>43.34732</cell><cell>1.333333</cell><cell>1.740051</cell><cell>7.120003</cell><cell>0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 6 : Sample Rapidminer software cluster result</head><label>6</label><figDesc></figDesc><table><row><cell>Year</cell><cell>Months</cell><cell>TotalRainfall</cell><cell>MaxTemp</cell><cell cols="2">MinTemp RH</cell><cell>FireDangerIndex</cell></row><row><cell>1989</cell><cell>Feb.</cell><cell>18.4</cell><cell>35</cell><cell>22</cell><cell>51</cell><cell>4</cell></row><row><cell>1994</cell><cell>Jan.</cell><cell>1.3</cell><cell>33</cell><cell>20</cell><cell>45</cell><cell>4</cell></row><row><cell>1998</cell><cell>Feb.</cell><cell>2</cell><cell>36</cell><cell>25</cell><cell>60</cell><cell>4</cell></row><row><cell>2001</cell><cell>Mar.</cell><cell>15</cell><cell>37</cell><cell>25</cell><cell>60</cell><cell>4</cell></row><row><cell>2004</cell><cell>Mar.</cell><cell>1.5</cell><cell>35.8</cell><cell>25.1</cell><cell>60</cell><cell>3</cell></row><row><cell>Mean</cell><cell></cell><cell>7.64</cell><cell>35.36</cell><cell>23.42</cell><cell>55.2</cell><cell>3.8</cell></row><row><cell>SD</cell><cell></cell><cell>8.361399</cell><cell>1.499333</cell><cell cols="2">2.319914 6.906519</cell><cell>0.447214</cell></row><row><cell></cell><cell></cell><cell cols="4">Table 7: Sample RapidMiner software cluster result</cell><cell></cell></row><row><cell>Year</cell><cell>Months</cell><cell cols="2">TotalRainfall MaxTemp</cell><cell>MinTemp</cell><cell>RH</cell><cell>FireDangerIndex</cell></row><row><cell>1979</cell><cell>Jul.</cell><cell>291.2</cell><cell>29</cell><cell>23</cell><cell>85</cell><cell>2</cell></row><row><cell>1979</cell><cell>Sept.</cell><cell>269</cell><cell>29</cell><cell>23</cell><cell>86</cell><cell>2</cell></row><row><cell>1979</cell><cell>Oct.</cell><cell>223.6</cell><cell>31</cell><cell>24</cell><cell>86</cell><cell>2</cell></row><row><cell>1979</cell><cell>Nov.</cell><cell>261.4</cell><cell>32</cell><cell>24</cell><cell>83</cell><cell>2</cell></row><row><cell>1980</cell><cell>Jun</cell><cell>306</cell><cell>31</cell><cell>23</cell><cell>82</cell><cell>2</cell></row><row><cell>1980</cell><cell>Aug.</cell><cell>427.4</cell><cell>28</cell><cell>23</cell><cell>88</cell><cell>2</cell></row><row><cell>1980</cell><cell>Sept.</cell><cell>333.5</cell><cell>29</cell><cell>23</cell><cell>90</cell><cell>2</cell></row><row><cell>1981</cell><cell>Sept.</cell><cell>233.9</cell><cell>30</cell><cell>23</cell><cell>86</cell><cell>2</cell></row><row><cell>1981</cell><cell>Oct.</cell><cell>225.1</cell><cell>31</cell><cell>24</cell><cell>83</cell><cell>2</cell></row><row><cell>1983</cell><cell>May</cell><cell>250.7</cell><cell>31</cell><cell>24</cell><cell>85</cell><cell>2</cell></row><row><cell>1984</cell><cell>May</cell><cell>223</cell><cell>32</cell><cell>23</cell><cell>86</cell><cell>2</cell></row><row><cell>1984</cell><cell>Jun</cell><cell>233.6</cell><cell>30</cell><cell>22</cell><cell>82</cell><cell>2</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">CoRI'16, Sept 7-9, 2016, Ibadan, Nigeria.   </note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Categorical data visualization and clustering using subjective factors</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ding</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data &amp; Knowledge Engineering</title>
				<editor>
			<persName><forename type="first">B</forename><forename type="middle">V</forename></persName>
		</editor>
		<imprint>
			<publisher>Elsevier</publisher>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An Extension of Self-Organizing Maps to Categorical Data</title>
		<author>
			<persName><forename type="first">N</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">C</forename><surname>Marques</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Portuguese conference on progress in Artificial Intelligence</title>
				<meeting>the 12th Portuguese conference on progress in Artificial Intelligence<address><addrLine>Berlin; Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Sringer-Verlag</publisher>
			<date type="published" when="2005">2005. ©2005</date>
			<biblScope unit="page" from="304" to="313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Data exploration using self-organizing maps</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kaski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series</title>
		<imprint>
			<biblScope unit="volume">82</biblScope>
			<date type="published" when="1997">1997. 1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">The Self-Organizing Map (SOM)</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kohonen</surname></persName>
		</author>
		<ptr target="http://www.cis.hut.fi/research/reports/quinquennial/" />
		<imprint>
			<date type="published" when="1994">1999. 1994-1998. January 2006</date>
		</imprint>
		<respStmt>
			<orgName>Helsinki University of Technology, Laboratory of Computer and Information Science, Neural Networks Research Centre, Quinquennial Report</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Using Smoothed Data Histograms for Cluster Visualization in Self Organizing Maps</title>
		<author>
			<persName><forename type="first">E</forename><surname>Pampalk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rauber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Merkl</surname></persName>
		</author>
		<idno>OeFAI-TR-2002-29</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Artificial Neural Networks</title>
		<title level="s">Springer Lecture Notes in Computer Science</title>
		<meeting>the International Conference on Artificial Neural Networks<address><addrLine>Madrid, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002. 2002</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Identification of rainfall patterns over the Valley of Mexico</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Pelczer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">L</forename><surname>Cisneros</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">11th International Conference on Urban Drainage</title>
				<meeting><address><addrLine>Edinburgh, Scotland, UK</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Principe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">R</forename><surname>Euliano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lefebvre</surname></persName>
		</author>
		<title level="m">Neural and Adaptive Systems: Fundamentals Through Simulations</title>
				<imprint>
			<publisher>John Wiley and Sons Inc</publisher>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page">656</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="http://www.statsoftinc.com/txtbook/glosd.html#DataMining" />
		<title level="m">Statsoft Electronic Statistics Textbook</title>
				<imprint>
			<publisher>Copyright</publisher>
			<date type="published" when="1984">2002. 1984-2003. June 2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ultsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Kohonen Maps</title>
				<imprint>
			<date type="published" when="1999">1999. 1999</date>
			<biblScope unit="page" from="33" to="46" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Maps for the Visualization of highdimensional Data Spaces</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ultsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Workshop on Self Organizing Maps</title>
				<meeting>Workshop on Self Organizing Maps<address><addrLine>Kyushu, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003a. 2003</date>
			<biblScope unit="page" from="225" to="230" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">U*-Matrix: a Tool to visualize Clusters in high dimensional Data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ultsch</surname></persName>
		</author>
		<idno>No. 36</idno>
		<imprint>
			<date type="published" when="2003">2003b. 2003</date>
		</imprint>
		<respStmt>
			<orgName>Computer Science Department, University of Marburg, Germany</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ultsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Moerchen</surname></persName>
		</author>
		<idno>No. 46</idno>
		<imprint>
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
		<respStmt>
			<orgName>Dept. of Mathematics and Computer Science, University of Marburg, Germany</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Self-and Superorganizing Maps in R: The kohonen Package</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wehrens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M C</forename><surname>Buydens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Statistical Software,published by the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">5</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Clustering Mixed Categorical and Numeric Data</title>
		<author>
			<persName><forename type="first">Zengyou</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shengchun</forename><surname>Xiaofe I Fe</surname></persName>
		</author>
		<author>
			<persName><surname>Deng</surname></persName>
		</author>
		<imprint>
			<date type="published" when="0293">2003. 1985 Jul. 307. 30 23 86 2 1985 Aug. 232. 30 23 89 2 1986 Jun 312. 9 31 23 83 2 1986 Sept. 374. 1 29 22 84 2 1987 Jul. 246. 30 23 85 2 1987 Aug. 357. 30 23 86 2 1987 Sept. 252. 31 23 87 2 1988 Jun 242. 9 30 22 82 2 1988 Jul. 240. 9 29 23 84 2 1988 Sept. 225. 1 30 23 87 2 1989 May 259. 2 32 23 83 2 1989 Jun 338. 31 23 86 2 1989 Aug. 275 29 22 88 2 1990 Apr. 233. 33 24 82 3 1990 Jul. 293</date>
			<biblScope unit="page" from="6" to="29" />
			<pubPlace>Harbin; P. R. China</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science and Engineering, Harbin Institute of Technology</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
