Anomaly Detection Using Similarity-based One-Class SVM for Network Traffic Characterization Bouchra Lamrini1 , Augustin Gjini1 , Simon Daudin1 , François Armando1 , Pascal Pratmarty1 and Louise Travé-Massuyès2 1 LivingObjects, Toulouse, France e-mail: {bouchra.lamrini,augustin.gjini,simon.daudin,françois.armando,pascal.pratmarty}@livingobjects.com 2 LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France e-mail: louise@laas.fr Abstract SVMs perform at least as good as other methods in terms of the generalization error [6]. In this paper∗ , we investigate an unsu- Many factors contributed to the high popularity of pervised machine learning method based SVMs today. First of all, their theoretical founda- on one-class Support Vector Machines for tions have been deeply investigated and they come anomaly detection in network traffic. In with a convex optimization procedure ensuring that the absence of any prior expert knowledge the global optimum will be reached. Moreover, the on anomalous data, we propose the use of solution is sparse making it really efficient in com- a similarity measure for Multivariate Time parison to other kernel-based approaches [7]. In ad- Series to evaluate the output results and se- dition, they may use a non linear transformation in lect the best model. A set of Key Perfor- the form of a kernel that even allow SVMs to be con- mance Indicators, oriented for network and sidered as a dimensionality reduction technique [8]. traffic monitoring, has been used to demon- One-Class SVMs [9] have been devised for cases in strate the promising performance of the un- which one class only is known and the problem is to supervised learning approach. detect anything outside this class. This is known as novelty detection and it refers to automatic identifi- cation of unforeseen or abnormal phenomena [1; 10; 1 Introduction 11], i.e. outliers, embedded in a large amount of nor- Anomaly detection aims at identifying unusual pat- mal data. terns in data that do not conform to expected behav- In contrast to traditional SVMs, One-Class SVMs ior [1]. These non-conforming patterns are generally (OC-SVM) learn a decision boundary that achieves referred in different application fields to as anomalies, maximum separation between the samples of the aberrations, discordant observations, exceptions, nov- known class and the origin [12]. Only a small frac- elty, outliers, peculiarities or contaminants, surprises, tion of data points are allowed to lie on the other side strangeness. There has been applications in several of the decision boundary: those data points are con- application fields from intrusion detection, e.g. iden- sidered as outliers. tifying strange patterns in network traffic that could Anomaly detection is particularly important for signal a hack [2], to system health monitoring, e.g. network traffic. The observed growth rate of infor- spotting a malignant tumor in an MRI image scan [3], mational and economic damage caused by intention- and from fraud detection in credit card transactions ally or unintentionally attacks, faults, and anomalies [4], to fault detection in operating environments [5]. has been a driving force behind efforts to ensure that In this paper we are interested in anomaly detection network monitoring systems are able to detect ab- in network traffic. normal activity and characterize it. The limitations Support Vector Machines (SVMs) have been one of of computing and storage resources require compe- the most successful machine learning techniques that tence and ingenuity to effectively characterize ever- can be used in a variety of classification applications. changing network traffic trends. Non-availability of labelled data, high costs for ∗ Index Terms: Anomaly Detection, Support Vector Ma- constituting labeled training data, and need to iden- chines (SVMs), One-Class SVMs, Unsupervised Learning, tify anomalous and novel observations in data without Model Selection, Similarity Measure, Multivariate Time having necessarily seen an example of that behaviour Series (MTS). in the past are the main challenges tackled in this work. A central issue is model selection, i.e. choice Section 2 presents our case study. Section 3 is de- of the optimal hyper-parameters that define the OC- voted to an overview of SVMs and One-Class SVMs SVM learning configuration. This requires a method methods. In Section 4, we present the EROS similar- to evaluate the results. ity measure used to search for the best training model This paper contributes to this problem by evalu- for anomaly detection. Experimental results and re- ating the results of the trained model by comparing lated discussions are provided in Section 5 to demon- samples predicted normal with samples in the train- strate the approach performance. Finally, Section 6 ing set. Because samples are composed of a set of concludes the paper. signals over a temporal window, we propose to use a similarity index for Multivariate Time Series (MTS) 2 Case Study called EROS (Extended Frobenius Norm). The re- Processing network traffic involves dealing with an sults of the model are evaluated iterativelly for differ- immense amount of data that is quickly and con- ent hyper-parameters of OC-SVM and the model that stantly varying. Considering the enormous amount evaluates best is selected. We show that OC-SVM in of data involved it is very easy for malicious activities combination with the Eros index [read more in Sec- to go undetected, especially without any knowledge a tion 4.1] can create automatically tuned reliable clas- priori about the nature of the traffic, like it is often the sifiers with reasonable computation cost. case in the network domain. The remainder of this paper is organized as follows. Figure 1: Training Data Set. The y-axis represents the KPI signals. From top to bottom: Total Incoming Traffic, Total Outgoing Traffic, Server Delay, and Network Delay. In this study, data was collected from a real-time A history of two months (3408 samples) of data monitoring platform dedicated to ensure key applica- generated at a 5 minutes rate was collected for each tion performance. 51 sites using applications of the of the four KPIs above and for each site. Data was same kind and having roughly the same uses at the segmented into time-windows wi , each of 48 points. same time, were chosen. For each application, we Figure 1 shows the data contained in the training set. selected carefully four relevant Key Performance In- Each time-window wi ∈ {w0 , ..., wl }, l = 70, is dicators (KPIs) describing: identified between lines and there are 70 samples. 1. Total Incoming Traffic. Let us notice that the data samples that are provided 2. Total Outgoing Traffic. to the OC-SVM classification algorithm are multivari- ate, each composed of four KPI segments over the 3. Server Delay, i.e. the connection time to the same time-window. The idea is to detect insidious server, which sets the expiration time for send- problems that require an analysis of the signal under- ing a request. lying interactions. We therefore want to detect “ab- 4. Network Delay that specifies how long it takes normal windows”. Each segment is characterized by for a bit of data to travel across the network from seven statistical attributes: minimum (MIN), maxi- one node to another. mum (MAX), mean (MEAN), median (MED), stan- Page 2/8 dard deviation (STD), number of average crossings tributes MEAN, MED, MAX, MIN, STD, nbpMean, (nbpMean), squared mean error (SME) computed be- and SME are illustrated in sub-figures from left to tween the raw data and the linear fitting. These at- right. We can already notice some overruns that will tributes are used by OC-SVM after normalization. make considerable contribution to the classifier pro- Figure 2 shows the attributes built for the 70 segments file defined by the decision function. of the four KPIs mentioned above. The seven at- Figure 2: Attributes built for training data KPIs. Each point is calculated on a segment of 48 points. 3 An Unsupervised Similarity-based where: + 1, if (wT x + b) ≥ 0  Method for Anomaly Detection T sign(w x + b) = Support Vector Machines (SVMs) have always been − 1, otherwise of interest in anomaly detection because of their abil- The concept of SVMs is to find (w, b) such that the ity to provide non-linear classification through a ker- hyperplane is positioned at maximum distance of the nel function. Via this short overview, we show that nearest training samples of the two classes in order SVMs are theoretically well founded. We briefly in- to reduce the generalization error. This distance de- troduce the basic concepts of SVMs then focus on fines the "margin". SVMs have first been proposed OC-SVM that we adopted in this study. A more de- for linearly separable classification tasks. However tailed presentation can be found in [13] and a good they were extended to non-linearly separable classifi- example is available on URL using "LibSVM" library cation problems. Some samples are allowed to violate of Matlab. the margin (soft-margin SVMs) and a non-linear deci- sion boundary can be obtained by projecting the data 3.1 Support Vector Machines into a higher dimension space thanks to a non-linear function Φ(x). Data points may not be linearly sepa- Let us consider the traditional two-class support vec- rable in their original space but they are “lifted” into tor machines in which we are given a set of n train- a feature space F where a hyperplane can separate ing instances S = {(x1 , y1 ), (x2 , y2 ), ..., (xn , yn )}. them. When that hyperplane is projected back into xi ∈ Rd , where yi is the class label of the xi in- the input space, it has a non-linear shape. To prevent stance and yi ∈ [−1, +1]. The linear SVMs classifier the SVM classifier from over-fitting noisy data, slack recovers an optimal separating hyperplane maximiz- variables ξ are introduced to allow some data points to ing the "margin" of the classifier with the equation: lie within the margin, and the parameter C > 0 (Eq.2) wT x + b = 0, with w ∈ F and b ∈ R two parameters tunes the trade-off between the classification error on witch determine the position of the decision hyper- the training data and margin maximization. The ob- plane in feature space F (its orientation is tuned by w jective function of SVM classifiers has the following and its displacement by b). The decision function can minimization formulation: thus be generally written as: 2 n kwk X min +C ξi (2) f (x; w, b) = sign(wT x + b) ∈ {−1, +1} (1) w,b,ξi 2 i=1 Page 3/8 Subject to: yi (wT φ(xi ) + b) ≥ 1 − ξi ξi ≥ 0, i = 1, ..., n The minimization problem is solved using La- grange Multipliers αi , i = 1, . . . , n. The new deci- Figure 3: SVM results with two kernel functions. sion function rule for a data point x is: n Schölkopf [13], which is presented in the next para- X f (x) = sign( αi yi K(x, xi ) + b) (3) i=1 graph, and that according to Tax and Duin [14]. In the feature space F , OC-SVM method basically Every αi > 0 is weighted in the decision function and separates all the data points from the origin by a hy- thus supports the machine. Since SVMs are consid- perplane and it maximizes the distance of this hyper- ered to be sparse, there are relatively few Lagrange plane to the origin. This results in a binary function multipliers with a non-zero value. which captures the region of the input space where The function K(x, xi ) = Φ(x)T Φ(xi ) is known as the training data lives. Thus the function returns +1 the kernel function. Since the outcome of the decision in a “small” region (capturing the training data points) function only relies on the dot-product of the vectors and −1 elsewhere. The quadratic programming min- in the feature space F (i.e. all the pairwise distances imization function is slightly different from the origi- for the vectors), it is not necessary to perform an ex- nal stated by (Eq.2) and (Eq.3): plicit projection. As long as a function K provides the same results, it can be used instead. This is known as 2 n kwk 1 X the kernel trick. min + ξi − ρ (5) w,ξi ,ρ 2 ηn i=1 Popular choices for the kernel function are linear, polynomial, and sigmoïdal. In this study, we used the Subject to: Gaussian Radial Base Function: w.φ(xi ) ≥ ρ − ξi − kx − xi k 2 ξ≥0 K(x, xi ) = exp( ) (4) i = 1, ..., n 2σ 2 where σ ∈ R is a kernel parameter and kx − xi k is Schölkopf et al. [13] has reformulated SVMs to the dissimilarity measure. With this set of formulas take the new regularization parameter η instead of and concepts we are able to classify a set of data point C in the original formulation (Eq.2 and Eq.3). The into two classes with a non-linear decision function. range of C is from zero to infinity, but η is always The power of the method comes from using ker- between [0, 1]. η characterizes the solution in a nice nel functions, which enable it to operate in a high- interpretable way: (1) it sets an upper bound on the dimensional, implicit feature space without ever com- fraction of outliers, e.g. the training examples re- puting the coordinates of the data in that space, but garded out-of-class, (2) and it sets a lower bound on rather by simply computing the inner products be- the number of training examples used as support vec- tween the images of all pairs of data in the fea- tors. ture space. This operation is often computationally Again by using Lagrange techniques and using a cheaper than the explicit computation of the coordi- kernel function for the dot-product calculations, the nates. Figure 3 illustrates a non linearly separable decision function becomes: data set clustered by SVM with two different kernel f (x) = sign((wΦ(xi )) − ρ) functions: linear and radial based. The observations n are plotted blue or magenta depending on the class X (6) = sign( αi K(x, xi ) − ρ) and the background is darker as the distance from the i=1 hyperplane is higher. Scores are given on the right OC-SVMs thus create a hyperplane characterized bottom corners and show a significant increase for the by w and ρ which has maximal distance from the ori- non linear kernel. gin in the feature space F , hence separating all the 3.2 One-Class Support Vector Machines data points from the origin. One-Class SVMs (OC-SVMs) are used to separate 4 Similarity-based Performance the data of one specific class, the target class, from other data. They are trained with positive examples Evaluation for Model Selection only, i.e. data points from the target class. There are In this section, we address the problem of fitting the two different approaches: the approach according to hyper-parameters of OC-SVM automatically, that is Page 4/8 the problem of automatic model selection. In the case where hai , bi i is the inner product of ai and bi , w is a of OC-SVM, this amounts to choose the kernel pa- weight vector which Pn is based on the eigenvalues of the rameter γ and the regularization parameter η. A pair MTS dataset, i=1 wi = 1 and cos(θi ) is the angle (γi , ηj ) is defined as a learning configuration. between ai and bi . The range of Eros is between 0 For this purpose, we propose to run OC-SVM for and 1, with 1 being the most similar. several learning configurations and select the best Definition 2. Singular Value Decomposition. Let configuration by evaluating the similarity of the KPI A be a general real m × n matrix. The singular value signals for the windows tagged normal by OC-SVM decomposition (SVD) of A is the factorization: and the KPI windows of the training data that are as- sumed to be normal examples. Since a sample win- A = U ΣV (8) dow is composed of several KPI signals, we need where U is a column-orthonormal N × r matrix, r a multidimensional similarity index for Multivariate is the rank of the matrix A, Σ is a diagonal r × r Time Series (MTS). matrix of the eigenvalues γi of A where γ1 ≥ ·· ≥ γr ≥ 0 and V is a column-orthonormal M ×r matrix. 4.1 The similarity Index Eros The eigenvalues and the corresponding eigenvectors Multidimensional similarity measures aim to indicate are sorted in non-increasing order. V is called the simultaneously the level of similarity between several right eigenvector matrix, and U the left eigenvector datasets (databases, data clusters, etc.). Unlike other matrix. methods [15; 16; 17] that seek the level of similarity Yang et al. (2005) [18] describe the similarity in- between two variables by omitting the existing cor- dex algorithm with the following steps: relation between the set of variables, a multidimen- 1. Compute the covariance matrix of each MTS. sional method takes into account the contribution of each variable in defining a global similarity measure. 2. Use SVD to decompose each covariance matrix. One of the methods processing MTS is the method 3. Recover eigenvalues and eigenvectors. Eros (Extended Frobenius Norm) [18]. The interest 4. Compute the weight w of individuals by normal- behind this method lies in its ability to assess the simi- izing the eigenvalues [18]. larity of MTS composed of a different number of data 5. Compute similarity between MTS. points. It indeed uses the eigenvalues and eigenvec- tors of the covariance matrix that has size n × n, n be- 4.2 Automatic Model Selection ing the number of times series composing the MTS. The first task is to define the learning configurations In doing so, it also performs dimension reduction be- that will be tested with OC-SVM. We follow the steps cause the number of observations is generally higher below: than that of the variables. We briefly describe the similarity index Eros based 1. Define the hyper-parameter space and a proce- on the Frobenius Norm below. The definitions and no- dure to explore this space. In our case, we set a tations used in this paper are taken from [19]. We first min-max and a variation step to constitute a grid formally define the similarity index Eros. Next, we (β × β) value pairs, i.e. β values for each hyper- present the algorithm describing the similarity mea- parameter. sure procedure and the approach proposed for model 2. Explore the hyper-parameter space and set OC- selection. SVM accordingly: for each pair of values, one Definition 1. Eros (Extended Frobenius Norm). OC-SVM classifier is obtained after the learning Let A and B be two MTS items of size mA × n step. The best configuration is retained by using and mB × n respectively. Let VA and VB two right the Eros similarity index on the validation data eigenvector matrices by applying Singular Value De- (25% of all data) and the training data (50% of composition (SVD) to the covariance matrices, MA all data). The corresponding OC-SVM classifier and MB , respectively. Let VA = [a1 , . . . , an ] and is taken as the best model. VB = [b1 , . . . , bn ], where ai and bi are column- 3. Once found the best model, anomaly detection is orthonormal of size n. The Eros similarity of A and performed on new data to evaluate how well the B is then defined as: model behaves. n The similarity of windows tagged normal by OC- SVM, denoted by MTSnormal , k = 0, ..., p, and the X Eros(A, B, w) = wi |< ai , bi >| k data windows of the training data (considered as nor- i=1 n (7) mal), denoted by MTSlearn l , l = 1, ..., q, is obtained as follows. For every learning configuration [Figure X = |cos(θi )| i=1 4] given by (γi , ηj ): Page 5/8 1. Compute Eros for every window pair all the window pairs. (MTSnormal k , MTSlearn l ), k = 0, ..., p, and The best learning configuration is taken as the one l = 1, ..., q. leading to the maximal "Erosmean " value over all con- 2. Compute the average similarity "Erosmean " over sidered learning configurations (γi , ηj ). Figure 4: Diagram showing the model selection process. 5 Experiments on the Case Study tection not only of singular points, but also of an atyp- ical set of points even if each point taken. Our detection approach was applied to the case study Acquired raw data provide KPIs with different presented in Section 2. A history of two months of ranges, then features (attributes) themselves don’t data generated every 5 minutes for four KPIs was col- have homogeneous ranges. In order to guarantee good lected over 151 sites. The window segmentation [20] performances of the anomaly detection approach, we was performed after analyzing two points that can sig- chose to normalize these attributes with respect to nificantly impact the detection stage: their maximal and minimal values with a tolerance • choice of the time-window length, i.e. the num- using a threshold s ∈ [0, 1]. This standard preprocess- ber of hours and samples to take account in a ing ensures that all the attributes contribute equally to window, the decision process independently of the parameters responsible of KPI dynamics. • definition of a reliable methodology to normalize To automatically select the best model, the hyper- training and testing datasets versus in these. parameter space was discretized with a 10 × 10 grid, As mentioned in section 2, the time-window length i.e. β = 10. 100 learning configurations were there- was chosen of 4 hours, i.e. 48 samples. Clearly, fore evaluated to select the best model. This off-line when access to web applications is established in a task was performed for each application site and ap- few hours, a window of four hours is considered a peared computationally feasible. significant period for traffic analysis. As noted above, Figure 5 shows some of the test results (25% of all each time-window is characterized by seven statistical data). From 24 time-windows (wi ∈ {w0 , . . . , wm }, attributes: minimum (MIN), maximum (MAX), mean m = 23), 4 anomalies have been detected represented (MEAN), median (MED), standard deviation (STD), by the 4 time-windows (yellow colored): w3 , w4 , w10 number of average crossings (nbpMean), squared and w12 . The results were confirmed with Parallel mean error (SME) computed between the raw data Coordinates plots given in Figure 6. and their linear fit. The attributes are computed for In a Parallel Coordinates Plot, each attribute is each time-window in order to obtain a multidimen- given its own axis and all the axes are placed paral- sional scatter plot, where each point represents a time- lel to each other. Values are plotted as a series of lines window. One of the major interests of segmentation that connect across all the axes. This means that each and feature computation is to synthesize the informa- line corresponds to one data window for which we tion contained in a time-window. This allows the de- have 7 × 4 attributes (7 features for every KPI). Page 6/8 The order in which the axes are arranged can im- Presenting this type of detection can ensure that the pact the way how the reader understands the data. network administrators adopts another reasoning to One reason for this is that the relationships between characterize the nature of the traffic (normal, abnor- adjacent variables are easier to perceive than those be- mal, critical, ...) circulating on the network. It may tween non-adjacent variables. So re-ordering the axes help him to identify the different forms of anomaly can help in discovering patterns or correlations across in his network. Data analysis must give meaning variables. We clearly see that the four time-windows to the data with the goal of discovering useful in- defined by the pink lines represent a strange behav- formation, suggesting conclusions, and supporting ior compared to the normal windows defined by the decision-making. The value of the data lies in the green lines. story it tells. Figure 5: Anomaly detection (yellow windows are detected abnormal). From top to bottom, KPIs appear in this order on the y-axis: Total Incoming Traffic, Total Outgoing Traffic, Server Delay, and Network Delay 6 Conclusion In this work, we applied the OC-SVM method to de- tect anomalies in real network traffic, contributing with an automatic method based on the similarity in- dex Eros [19] for setting the hyper-parameters which define the learning configuration. It provided very sat- isfactory results. The advantages of novelty detection for complex processes like network traffic are multiple. In partic- ular there is no need of faulty data. A wide variety of cases of anomaly exist and it would be impossible to characterize them all or to gather the corresponding data. Challenges for future work is related to the fact that data comes in a stream and dealing with the data in real-time is quite tedious. The amount of data leads to cases where resources are limited. Novelty Figure 6: Parallel Coordinates Plot illustrating the detection in a distributed framework is also to be four abnormal windows. From top to bottom, the investigated. curves labeled in pink color shows successively the time-windows: w3 , w4 , w10 and w12 . Acknowledgement The authors thank Bertrand Le Marec and David Maisonneuve, leading team of LivingObjects, for their support and valuable com- ments about the application. Page 7/8 References [12] B. Schölkopf, J.C. Platt, J.C. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the [1] V. Chandola, A. Banerjee, and V. Kumar. support of a high-dimensional distribution. Neu- Anomaly detection: A survey. ACM Computing ral Computation, 13(7):1443–1471, 2001. Surveys, 41(3):15:1–15:58, 2009. [13] B. Schölkopf, R. Williamson, A. Smola, [2] V. Kumar. Parallel and distributed computing J. Shawe-Taylor, and J. Platt. Support vector for cybersecurity. IEEE Distributed Systems On- method for novelty detection. In Proceeding of line, 6(10):1–9, 2005. the 12th International Conference on Neural In- [3] C. Spence, L. Parra, and P. Sajda. Detection, formation Processing Systems, pages 582–588, synthesis and compression in mammographic 1999. image analysis with a hierarchical image prob- [14] R.P.W. Tax, D.M.J. and Duin. Support ability model. In Proceedings of the IEEE vector data description. Machine learning, Workshop on Mathematical Methods in Biomed- 54(1):45–66, 2004. ical Image Analysis (MMBIA’01), MMBIA’01, [15] G.E.A.P.A. Batista, W. Wang, and E.J. Keogh. A pages 3–, 2001. complexity-invariant distance measure for time [4] E. Aleskerov, B. Freisleben, and B. Rao. Card- series. SDM, 2011. watch: a neural network based database min- [16] C.A. Ratanamahatana and E.J. Keogh. Mak- ing system for credit card fraud detection. In ing time-series classification more accurate us- Proceedings Of The IEEE/IAFE 1997 Compu- ing learned constraints. In Proceedings of tational Intelligence For Financial Engineering SIAM International Conference on Data Mining (CIFEr), pages 220–226, 1997. (SDM’04), pages 11–22, 2004. [5] R. Fujimaki, T. Yairi, and K. Machida. An ap- [17] S. Park, W.W. Chu, J. Yoon, and C. Hsu. Effi- proach to spacecraft anomaly detection problem cient searches for similar subsequences of dif- using kernel feature space. In Proceedings of ferent lengths insequence databases. In 16th the Eleventh ACM SIGKDD International Con- International Conference on Data Engineering, ference on Knowledge Discovery in Data Min- pages 23–32, 2000. ing, KDD’05, pages 401–410, New York, NY, [18] K. Yang and C. Shahabi. A multilevel distance USA, 2005. ACM. based index structure for multivariate time se- [6] C.J.C. Burges. A tutorial on support vector ma- ries. In 12th International Symposium on Tem- chines for pattern recognition. Data Mining and poral Representation and Reasoning, 2005. Knowledge Discovery, 2(2):121–167, 1998. [19] K. Yang and C. Shahabi. A pca-based simi- [7] C.M. Bishop. Pattern Recognition and Machine larity measure for multivariate time series. In Learning. Springer, 2006. Proceedings of the Second ACM International WorkShop on multimedia Databases, 2004. [8] W. Wang, Z. Xu, W. Lu, and X. Zhang. Deter- mination of the spread parameter in the gaussian [20] S. Fuertes, G. Picart, J.Y. Tourneret, L. Chaari, kernel for classification and regression. Neuro- A. Ferrari, and C. Richard. Improving space- computing, 55(3-4):643–663, 2003. craft health monitoring with automatic anomaly detection techniques. In 14th International Con- [9] M.A.F. Pimentel, D.A. Clifton, and L.C. ference on Space Operations., page 2430, 2016. Tarassenko. A review of novelty detection. Sig- nal Processing, 99:215–249, 2014. [10] D. Dasgupta and S. Forrest. Novelty detection in time series data using ideas from immunology. In Proceedings of The 5th International Con- ference on Intelligent Systems, Reno, Nevada, 1996. [11] E. Keogh, S. Lonardi, and W. Chiu. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, pages 550–556, New York, NY, USA, 2002. ACM. Page 8/8