A study of SOM clustering software implementations A. B. Adeyemo Computer Science Department University of Ibadan Nigeria +2348052107367 sesanadeyemo@gmail.com ABSTRACT dimensional (or (rarely) three or more-dimensions. The reason Clustering algorithms generally suffer from some well-known for using one- and two dimensional grids is that space problems for which the Self Organizing Maps (SOM) structures of higher dimensionality cause problems with data algorithms are adept at handling. While there are many display and cannot be displayed on the monitor. The SOM variants of the SOM algorithm, software programmes that working algorithm is a variant of multidimensional vectors implement the SOM algorithms have tended to give varying clustering of which the Kmeans clustering algorithm is an results even when tested on the same data sets. This can have example of this type of algorithm [9]. serious implications when the goal of the clustering is novelty detection. In this study a comparison of the performance of The SOM neural network uses a competitive learning algorithm some SOM clustering software was carried out and results and is a method for unsupervised learning, based on a grid of presented. artificial neurons whose weights are adapted to match input CCS Concepts vectors in a training set. The SOM algorithm is fed with • General and reference ➝ -computing tools and techniques feature vectors, which can be of any dimension. The algorithm ➝ Empirical studies for the training of the SOM [4] is explained easily in terms of a set of artificial neurons, each having its own physical Keywords location on the output map, which take part in a winner-take- Comparative Analysis; Clustering; Self Organizing Maps. all process where a node with its weight vector closest to the vector of inputs is declared the winner and its weights are 1. INTRODUCTION adjusted making them closer to the input vector. In the clustering process data is grouped in such a way that the intra-cluster similarity is maximized while the inter-cluster similarity is minimized. Data can be described by either categorical or numeric features. Due to the differences in the characteristics of these two kinds of data, attempts to develop criteria functions for mixed data have not been very successful [15]. There are two widely used clustering methods: the hierarchical and the nonhierarchical (partitional) methods. The hierarchical clustering process can be categorized as divisive when a large data set is divided into several small groups and, agglomerative when a small data set are put together to create a larger cluster. Self-Organizing Maps (SOM) are competitive networks that provide a "topological" mapping from the input space to the clusters [4]. The SOM was inspired by the way in which various human sensory impressions are neurologically mapped into the brain such that spatial or other relations among stimuli correspond to spatial relations among the neurons. In a SOM, the neurons (clusters) are organized into a grid which is usually two-dimensional, but sometimes one- Figure 1: Illustration of the updating of the Best Matching Unit (BMU) of a SOM grid and its neighbors In each training step, one sample vector „x‟ from the input data set is chosen randomly and a similarity measure is calculated between it and all the weight vectors of the map. The Best- CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria. Matching Unit (BMU), denoted as „c‟, is the unit whose weight vector has the greatest similarity with the input sample „x‟ (figure 1). The similarity is usually defined by means of a distance measure, usually the Euclidian distance. The BMU is 160 defined mathematically as the processing element for which neighborhood relationship, and the density mapping. Neighboring the expression: neurons in the SOM cannot be too far away from each other (in order to maintain their similarity) but the SOM also wants to place more neurons in areas of high input density (for example, logical . …..…………….….. 1 clusters). Because of this, there will be neurons that will be placed where d is the distance measure. in areas between natural clusters which are typically low input density areas (so that the map can "stretch" between clusters). Each node has a set of neighbors. When a node wins a competition, the neighbor‟s weights are also changed but not The standard SOM algorithm uses numeric type variables and the as much as that of the winning node. The further the neighbor Euclidean distance function. The arithmetic operations used is from the winner, the smaller its weight change. The SOM during the learning phase for the update of the feature vectors update rule for the weight vector of the unit i is given cannot be used with categorical values. The SOM was not directly mathematically as: designed to work with categorical variables due to the limitation of learning laws. The method usually adopted is to translate ………………2 categories to numeric numbers during data pre-processing before training using the transformed data using standard SOM algorithm where [2]. The Kohonen SOM clustering algorithm has also been used t represents the sample index for each presentation of a sample „x‟ for classification purposes with remarkable results. There is a fundamental difference between the clustering process and the hc(x),i represents the neighborhood function around the winner unit classification process. Clustering is an unsupervised process while „c‟, with neighborhood radius r(t). classification is supervised. Usually data clustering is used as a The neighborhood function is like a smoothing kernel that is pre-processor for classification purposes [8]. time-variable. It is a decreasing function of the distance A rich variety of versions of the basic SOM algorithm have between the the ith and cth reference vectors on the map grid. been proposed. Some of the variants aim at improving the The neighborhood function is usually expressed as the preservation of topology by using more flexible map structures Gaussian function which can be expressed mathematically as: instead of the fixed grid. Some of these methods however cannot be used for visualization as easily as the regular grid. Some variants aim at reducing the computational complexity of the SOM [3]. Experiments using different distance …………………3 measures, map topologies, training parameters such as the where learning rate and neighbourhood function can be carried out. ά(t) represents the learning rate factor and takes values 0< ά(t)<1 σ(t) represents the width of the neighborhood function which Using identical settings, training of a SOM map over different decreases monotomically with the regression steps. iterations can lead to different mappings, because of the random initialisation. Yet it has been shown that the A simpler definition of the neighbourhood function given by conclusions drawn from the map remain remarkably Kohonen [4] is: consistent, which makes it a very useful tool in many different hc(x),I=σ(t)…………………………………………………….4 circumstances [14]. Some of the desirable features that good If ║ri – rc║ is smaller than a given radius around node „c‟ and SOM clustering software should have include: the radius is also a monotomically decreasing function of the 1. Being able to set the neighborhood kernel function and regression steps, but otherwise hc(x),I = 0. σ(t) is a to set the start value for the neighborhood function diminishing function of time. At the beginning of the learning (learning radius): The neighborhood function procedure it is fairly large, but it is made to gradually shrink determines how strongly the processing elements are during learning. Towards the end of learning a single winning connected to each other. Neighborhoods of different processing element is trained. A linear diminishing function of sizes in different neuron configurations (e.g. time is usually used. The learning process consisting of winner rectangular and hexagonal lattices) can be used. The selection by Equation (1) and adaptation of the synaptic simplest neighborhood function is the bubble (winner- weights by Equation (2). This process is repeated for each takes-all): it is constant (or 1) over the whole input vector, usually for a large number of cycles with neighborhood of the winner unit and zero elsewhere. different inputs producing different winners. The network Usually the neighbourhood function is expressed as a therefore associates output nodes with groups or patterns in the Gaussian function and as expected using the winner- input data set. The SOM algorithm is very simple and allows takes-all function retrieves less clusters than the for many subtle adaptations. Gaussian function. 2. Being able to set the activation function and weight There are some visual displays that are used to "determine" where initialization methods: Before the training, initial values the natural cluster boundaries are in the SOM. Some of the visual are given to the prototype vectors of the SOM. The tools that can be used are Histograms [6], Component Plane SOM is very robust with respect to the initialization displays [3], U-matrix, P-matrix and U* matrix displays [10], process, however, when properly accomplished it [11], [12, [13]. An important concept in interpreting these displays allows the algorithm to converge faster to a good is the interaction of the two properties of the SOM. These are the 161 solution. Initialization procedures that have been used This work presents a comparative study of the performance are: Random initialization, where the weight vectors some SOM clustering software when tested on the same data are initialized with small random values; Sample set. Results were presented and reasons for the observed initialization, where the weight vectors are initialized variations presented. The study also presents the desirable with random samples drawn from the input data set; features that standard SOM software should have. Linear initialization, where the weight vectors are initialized in an orderly fashion along the linear 2. MATERIALS AND METHODS subspace spanned by the two principal eigenvectors of Agro metrological data for FRIN headquarters, Ibadan, the input data set. Nigeria was used. The data set had 254 records and the attributes in the data set were: Year (numeric), Month (text), 3. Being able to set the choice of cooling strategy during Total Rainfall in millimeters (numeric), Minimum training: for example linear or exponential. Temperature in Celsius (numeric), Maximum Temperature in 4. Being able to set the distance measure to be used, for Celsius (numeric), Relative Humidity and Fire Danger Index example, Euclidean, Manhattan and Maximum value: It (numeric). The SOM software used were: NNClust, Pittnet is noted that the distance measure between data points Neural Network Educational Software and RapidMiner Studio. is an important component of a clustering algorithm. If the components of the data instance vectors are all in The NNClust software was programmed to use only the the same physical units then it is possible to use the Gaussian neighbourhood function and the Euclidean distance simple Euclidean distance metric to successfully group measure. The user can input the learning rate and starting similar data elements. The Euclidean distance in a two neighbourhood size. The software automatically normalizes or three-dimensional space measures is the actual the input data between -1 and 1 and has features for generating geometric distance between objects in the space. data/result statistics and data visualization such as weight However, it has been observed that even the Euclidean maps and radar charts. The Pittnet software also uses the distance can sometimes be misleading, because of the Gaussian neighbourhood function and Euclidean distance way the mathematical formula used to combine the metrics. The user also defines the starting learning rate and it distances between the single components of the data also automatically normalizes the data between 0 and 1. It is a feature vectors into a unique distance measure that can DOS based program that saves its result in a text file and has be used for clustering purposes is computed. Different no data analysis or data visualization ability. RapidMiner formulas lead to different clustering‟s. Therefore, studio (Community Edition) has facilities for selecting domain knowledge must be used to guide the parameters for defining the learning rate, neighbourhood formulation of a suitable distance measure for each radius and can choose either to normalize the data or not. It particular application. also has an array of tools for statistical data analysis and data 5. Being able to set the scaling technique to be used: for visualization. example z-transform, (0,1) transform, (1,-1) transform or none, depending on the clustering goal and data set. Using the three software‟s clusters was generated. The 6. Being able to set the starting and stopping learning rate: arithmetic mean of each cluster group was also computed. The The learning rate is a decreasing function of time arithmetic mean is a measure of central tendency which between [0,1]. The learning rate can be expressed as a describes the central location of data and is usually used with linear function and as a function inversely proportional other statistical measures such as the standard deviation to time. Using the inverse function ensures that all because it can be affected by extreme values in the data set and input samples have approximately equal influence on therefore be biased. The standard deviation describes the the training result. Some learning rate functions that spread of the data and is a popular measure of dispersion. It have been implemented are the linear, inverse-of-time, measures the average distance between a single observation and as a power ser. and its mean. 7. Being able to set the training algorithm to be used: for example batch, on-line, hybrid etc. The batch algorithm 3. RESULTS AND DISCUSSION has been shown to be faster [4] than the normal The meteorological data was clustered using NNClust SOM sequential algorithm (and the results are just as good or clustering software with a starting learning rate of 0.9 and was even better). trained over 100 epochs. The software accepts only numeric 8. Good data visualization options: for example values. Non numeric values are treated as missing values histograms, hinton charts, weight charts (maps), U- which are replaced by the column mean. The software was set Matrix, P-Matrix etc. Good result analysis and to identify a maximum of ten clusters, however only eight presentation functions: computation of vital statistics clusters were generated. The software uses the number of for evaluating the quality of the clustering for example, clusters specified to create the SOM grid. The mean and mean, standard deviation (or variance), correlation standard deviation of the eight clusters were computed. coefficient, t-test etc. Increasing the training cycle did not improve the results. Table 1 presents the summary of the eight clusters, while figure 2 presents the chart of the cluster means. 162 160 200 180 TotalRainfall 140 160 120 140 100 120 MaxTemp 100 80 Cluster 1 80 MinTemp 60 Cluster 2 60 40 40 Cluster 5 RH 20 20 Cluster 6 0 0 FireDangerInd Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 ex Figure 2: Chart of NNClust cluster means The meteorological data was trained using the Pittnet software Figure 3: Chart of Pitnett software cluster means with a starting learning rate of 0.9 and was set to train for 100 epochs, although the software stops training as soon as the maximum number of clusters have been generated. The 300 software requires the user to specify the number of clusters TotalRainfall expected apriori. This number is used in conjunction with the 250 number of input signals (attributes) to determine the SOM grid size. Expected number of clusters was set to ten. The software 200 MaxTemp identified only four clusters. The mean and standard deviation of the clusters were computed. Table 2 presents the summary 150 of the clusters, while figure 3 presents the chart of the cluster MinTemp 100 means. TheRapidMiner Studio software was used to cluster the 50 RH meteorological data set using a starting learning rate of 0.9 and 0 was trained over a 100 epochs. The expected number of clusters was set at ten and the software generated ten clusters. FireDangerIn Table 3 presents the summary of the cluster means with their dex standard deviations while figure 4 presents a chart of their cluster means. Figure 4: Chart of Rapid Miner Studio cluster means 3.1 Discussion of Results The quality of the clusters identified in the data by the three Similarly considering the clusters identified by the Pittnet software‟s can be inferred from a comparison of the mean and software in table 2 the same trend is observed. Table 5 standard deviation of the clusters. If the value of the standard presents the records for cluster 4 (table 2) for the Pittnet deviation is low, then the clustered records are within the same software cluster results. It can be observed that the cluster is range. However if the value is high this suggests the presence consists of data records which have the same value for the of outliers in the clustered data records. For example table 4 FireDangerIndex attribute. However, considering the Total presents the clustered records for cluster 2 (table1) for the Rainfall field which has a mean value of 39.74444 and a NNClust software which is representative of the trend standard deviation of 43.34732. The high standard deviation observed in the clusters identified by the software. Interpreting value implies that there are outlier data values in the clustered the cluster is indecisive when the values in the Total Rainfall records. fields are considered. The field has a mean of 142.05 and a standard deviation of 136.011711. The clusters identified by the RapidMiner software presented in table 3 were easier to interpret. They followed the expected rainfall pattern which is known for the region where the data was collected [5]. Cluster 2 (table 3) contained records with only a high FireDangerIndex of 4 as presented in table 6, while cluster 5 (table 3) contains records with the highest recorded Rainfall level in the data set. The other clusters also contained data records which can be categorized by the Rainfall level pattern of the region. 163 4. ACKNOWLEDGMENTS Portuguese conference on progress in Artificial Intelligence , pp Some of the problems found in the literature about clustering 304 - 313, (Sringer-Verlag Berlin, Heidelberg ©2005) algorithms are: Most clustering techniques are based on [3]. Kaski S., (1997), "Data exploration using self-organizing distance calculations which are very sensitive to ranges of maps”, Acta Polytechnica Scandinavica, Mathematics, variables, therefore the values have to be normalized. Computing and Management in Engineering Series No. 82, Normalization however is a subjective function, and these Espoo 1997. transformations cannot be carried out without creating biases; [4]. Kohonen T, (1999), “The Self-Organizing Map (SOM)”, The presence of outliers in data sets create problems in data Helsinki University of Technology, Laboratory of Computer clustering based on distance calculations when they have not and Information Science, Neural Networks Research Centre, been identified and removed from the data set; Handling Quinquennial Report (1994-1998), (Downloaded from categorical variables (non-numeric data, non-numeric http://www.cis.hut.fi/research/reports/quinquennial/ January variables, categorical data, nominal data, or nominal variables) 2006). are a problem for most clustering algorithms, and even when [5]. Nigeria Climate Review, 2010, Nigerian Meteorological data encoding methods are used they can introduce extra Agency, www.nimetng.org biases due to the number of values which the encoding [6]. Pampalk E, Rauber A, Merkl D, (2002), “Using Smoothed introduces in the categorical variables; The selection of Data Histograms for Cluster Visualization in Self Organizing variables also has a large influence on clustering results, while Maps”, Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International the assigning of different weights for variables and categorical Conference on Artificial Neural Networks, Springer Lecture values can be used, when many variables and categorical Notes in Computer Science, Madrid, Spain, 2002. values are involved, it can affect the clustering quality; [7]. Pelczer I. J. and Cisneros H. L., (2008), “Identification of Capturing patterns (or behaviors) hidden inside time-varying rainfall patterns over the Valley of Mexico”, 11th International variables and modeling them is another problem and most Conference on Urban Drainage, Edinburgh, Scotland, UK, clustering techniques do not possess this predictive modeling 2008 capability; Most clustering techniques were developed for [8]. Principe J. C., Euliano N. R. Lefebvre W. C, (2000), Neural laboratory generated simple data sets consisting of a few to and Adaptive Systems: Fundamentals Through Simulations, several numerical variables; hence they can‟t be used for large John Wiley and Sons Inc, ISBN 0-471-35167-9, pp 656. data analyses that consist of many categorical complex data. [9]. Statsoft Electronic Statistics Textbook, (2002), Copyright, 1984-2003, (http://www.statsoftinc.com/txtbook/glosd.html#Data Mining), Most common implementation of data clustering algorithms Downloaded June suffer from these problems, however, SOM‟s are very robust 2002. and are adept at handling these problems but this depends also [10]. Ultsch, A., (1999), Data Mining and Knowledge Discovery on the goal of the algorithm‟s implementation (programming). with Emergent Self-Organizing Feature Maps for Multivariate Applications programmed for demonstration purposes cannot Time Series, In Kohonen Maps, (1999), pp. 33-46. [11]. Ultsch A, (2003a), Maps for the Visualization of high- be used for large scale projects and some implementations are dimensional Data Spaces, Proc. Workshop on Self Organizing not flexible and do not give users much options. However if Maps, pp 225 - 230, Kyushu, Japan, 2003. the various implementations of the conventional SOM [12]. Ultsch A, (2003b), U*-Matrix: a Tool to visualize Clusters algorithm (which are usually focused on the goals of the in high dimensional Data, Technical Report No. 36, Computer programmer) provides enough options to the user, it is still a Science Department, University of Marburg, Germany, 2003. very robust algorithm that can be used for both numerical, [13]. Ultsch A., Moerchen F, (2005), ESOM-Maps: tools for categorical and mixed data sets. Further work in this study is clustering, visualization, and classification with Emergent focused on the development of an open flexible SOM SOM, Technical Report No. 46, Dept. of Mathematics and clustering tool with adequate features that can be used for Computer Science, University of Marburg, Germany, 2005. research purposes. [14]. Wehrens R. Buydens L. M. C., 2007, Self- and Super- organizing Maps in R: The kohonen Package, Journal of 5. REFERENCES Statistical Software,published by the American Statistical [1]. Chang C., Ding Z., (2004), "Categorical data visualization Association, Vol. 21, Issue 5 and clustering using subjective factors", Data & Knowledge [15]. Zengyou He, Xiaofe I Fe, Shengchun Deng, (2003), Engineering, Published by Elsevier B.V. “Clustering Mixed Categorical and Numeric Data”, [2]. Chen N. and Marques N. C., (2005), “An Extension of Self- Department of Computer Science and Engineering, Harbin Organizing Maps to Categorical Data”, Proceedings of the 12th Institute of Technology, Harbin 150001, P. R. China 164 Table 1: Summary of NNClust clusters TotalRainfall MaxTemp MinTemp RH FireDangerIndex Cluster 1 Mean 3.7 32 24 83 2 SD 0 0 0 0 0 Cluster 2 Mean 142.05 33.5 24.5 79.33333 2.666666667 SD 2.61629509 22.627417 16.9706 4.501851 0.516397779 Cluster 3 Mean 113.313158 31.1236842 31.0605 70.54737 2.5 SD 69.9895185 15.4557389 11.4404 45.62364 1.246560403 Cluster 4 Mean 149.99 30.8333333 30.2967 73.75333 2.333333333 SD 98.1425436 3.53058883 20.0499 25.41582 0.546672274 Cluster 5 Mean 109.891667 30.6333333 36.1667 64.64444 2.638888889 SD 92.1210985 4.02073199 24.3938 34.37646 0.723198364 Cluster 6 Mean 141.621277 31.7574468 27.0617 73.1617 2.617021277 SD 97.0359995 2.63056819 13.7078 20.8623 0.644481304 Cluster 7 Mean 123.545794 31.4411215 29.4963 74.41028 2.411214953 SD 81.8137003 2.96536463 18.4077 24.4239 0.531165877 Cluster 8 Mean 175.268966 29.3793103 23.069 86.89655 2.068965517 SD 85.4901878 1.49794605 1.06674 4.312315 0.257880715 Table 2: Summary of the Pitnett software clusters TotalRainfall MaxTemp MinTemp RH FireDangerIndex Mean 50.850001 24.75 63.5 3.9 4 Cluster 1 SD 31.32483 0.070709 12.0208153 0.141421356 0 Mean 134.3332 31.7082 23.5984375 82.4218728 2.3828125 Cluster 2 SD 91.137324 2.254123 1.06439596 6.908488013 0.487025284 Mean 138.05185 24.64815 84.4074074 2.196296296 2.407407407 Cluster 3 SD 45.668999 15.90804 27.2370968 39.48311832 1.836329785 Mean 39.744444 35.55556 23.5555556 59.22222133 4 Cluster 4 SD 43.347321 1.333333 1.74005108 7.120003363 0 Table 3: Summary of Rapid miner Studio clusters TotalRainfall MaxTemp MinTemp RH FireDangerIndex Mean 42.35385 33.41154 23.99615 78.46153846 2.730769231 cluster 0 SD 8.192056 2.308823 0.911913 7.798619207 0.603833905 Mean 13.50513 33.47179 23.80769 77.43589744 2.820512821 cluster 1 SD 9.379343 2.342845 1.280909 6.302860135 0.451418517 Mean 7.64 35.36 23.42 55.2 3.8 cluster 2 SD 16.15873 17.96476 13.16786 40.93966268 1.299899072 Mean 57.94667 25.35333 78.13333 2.726666667 2.933333333 cluster 3 SD 13.23034 15.63488 11.11308 32.15964741 1.361648053 165 Mean 211.4214 23.90714 88.14286 1.871428571 2.071428571 cluster 4 SD 46.93198 1.320527 4.24005 0.299816794 0.267261242 Mean 270.4346 30.36154 23.21923 85.19230769 2.115384615 cluster 5 SD 42.68863 1.395814 0.859101 5.129837598 0.322602539 Mean 188.0463 30.77805 23.31463 84.90243902 2.146341463 cluster 6 SD 15.90989 1.518801 0.887288 5.180757078 0.357839043 Mean 144.6971 31.42 23.47429 82.85714286 2.342857143 cluster 7 SD 10.84353 1.991127 0.995089 7.6855206 0.481593992 Mean 110.85 31.84474 23.72105 82.31578947 2.473684211 cluster 8 SD 9.73158 2.332462 1.076822 6.794692934 0.603451429 Mean 70.05862 32.27241 24.04828 81.31034483 2.482758621 cluster 9 SD 8.635041 2.37936 1.180684 9.043953972 0.508547628 Table 4: Sample NNClust software cluster result Year Months TotalRainfall MaxTemp MinTemp RH FireDangerIndex 1980 Feb. 60 35 27 75 3 1987 Aug. 357.1 30 23 86 2 1987 Nov. 10 35 24 80 3 1989 Mar. 57 35 25 77 3 1991 Apr. 108.9 32 24 83 2 1998 Sept. 259.3 34 24 75 3 Mean 142.05 33.5 24.5 79.33333 2.666667 SD 136.0117 2.073644 1.378405 4.501851 0.516398 Table 5: Sample Pittnet software cluster result Year Months TotalRainfall MaxTemp MinTemp RH FireDangerIndex 1989 Feb. 18.4 35 22 51 4 1990 Feb. 40.3 35 23 64 4 1990 Mar. 11.7 37 25 69 4 1994 Jan. 1.3 33 20 45 4 1997 Mar. 122.2 35 23 62 4 1998 Feb. 2 36 25 60 4 2000 Mar. 48.8 37 25 62 4 2001 Mar. 15 37 25 60 4 2001 Apr. 98 35 24 60 4 Mean 39.74444 35.55556 23.55556 59.22222 4 SD 43.34732 1.333333 1.740051 7.120003 0 166 Table 6: Sample Rapidminer software cluster result Year Months TotalRainfall MaxTemp MinTemp RH FireDangerIndex 1989 Feb. 18.4 35 22 51 4 1994 Jan. 1.3 33 20 45 4 1998 Feb. 2 36 25 60 4 2001 Mar. 15 37 25 60 4 2004 Mar. 1.5 35.8 25.1 60 3 Mean 7.64 35.36 23.42 55.2 3.8 SD 8.361399 1.499333 2.319914 6.906519 0.447214 Table 7: Sample RapidMiner software cluster result Year Months TotalRainfall MaxTemp MinTemp RH FireDangerIndex 1979 Jul. 291.2 29 23 85 2 1979 Sept. 269 29 23 86 2 1979 Oct. 223.6 31 24 86 2 1979 Nov. 261.4 32 24 83 2 1980 Jun 306 31 23 82 2 1980 Aug. 427.4 28 23 88 2 1980 Sept. 333.5 29 23 90 2 1981 Sept. 233.9 30 23 86 2 1981 Oct. 225.1 31 24 83 2 1983 May 250.7 31 24 85 2 1984 May 223 32 23 86 2 1984 Jun 233.6 30 22 82 2 1985 Jul. 307.2 30 23 86 2 1985 Aug. 232.2 30 23 89 2 1986 Jun 312.9 31 23 83 2 1986 Sept. 374.1 29 22 84 2 1987 Jul. 246.8 30 23 85 2 1987 Aug. 357.1 30 23 86 2 1987 Sept. 252.5 31 23 87 2 1988 Jun 242.9 30 22 82 2 1988 Jul. 240.9 29 23 84 2 1988 Sept. 225.1 30 23 87 2 1989 May 259.2 32 23 83 2 1989 Jun 338.7 31 23 86 2 1989 Aug. 275 29 22 88 2 1990 Apr. 233.8 33 24 82 3 1990 Jul. 293.6 29 23 90 2 167 1990 Oct. 255.4 31 23 85 2 1991 May 258.2 32 24 84 2 1991 Jul. 306.6 29 23 90 2 1992 Sept. 275.4 29 23 90 2 1992 Oct. 276.3 31 23 88 2 1993 Jul. 261 29 27 87 2 1993 Aug. 237.7 29 23 90 2 1993 Sept. 255.5 30 23 86 2 1994 Sept. 236 30 23 89 2 1995 May 334.3 31 24 81 2 1995 Aug. 304.2 29 23 91 2 1996 Aug. 224.7 30 23 89 2 1996 Sept. 304.1 29 22 90 2 1997 Apr. 261.7 32 24 70 3 1998 May 245.4 34 25 70 3 1998 Sept. 259.3 34 24 75 3 2000 Jul. 220.4 29 23 73 3 2000 Aug. 263.8 29 23 85 2 2001 May 265 33 24 74 3 2001 Sept. 275.2 29 22 90 2 2002 Oct. 265 29 24 87 2 2003 Jun 275.3 30.6 24.5 92 2 2003 Sept. 226 30.8 22.4 92 2 2003 Oct. 254.9 32 23.2 92 2 2006 Sept. 250.8 30.4 22.3 86 2 Mean 270.4346 30.36154 23.21923 85.19231 2.115385 SD 42.68863 1.395814 0.859101 5.129838 0.322603 168