=Paper=
{{Paper
|id=Vol-1353/paper_04
|storemode=property
|title=Image Reduction Using Assorted Dimensionality Reduction Techniques
|pdfUrl=https://ceur-ws.org/Vol-1353/paper_04.pdf
|volume=Vol-1353
|dblpUrl=https://dblp.org/rec/conf/maics/NsangBS15
}}
==Image Reduction Using Assorted Dimensionality Reduction Techniques==
Image Reduction Using Assorted Dimensionality Reduction Techniques Augustine S. Nsang Abdullahi Musa Bello Hammed Shamsudeen Department of Computer Science Department of Computer Science Department of Computer Science School of Inf Tech and Computing School of Inf Tech and Computing School of Inf Tech and Computing American University of Nigeria American University of Nigeria American University of Nigeria Yola By-Pass, PMB 2250, Yola, Nigeria Yola By-Pass, PMB 2250, Yola, Nigeria Yola By-Pass, PMB 2250, Yola, Nigeria augustine.nsang@aun.edu.ng abdullahi.bello@aun.edu.ng shamsudeen.hammed@aun.edu.ng Abstract Other dimensionality reduction methods, however, reduce a dataset to a subset of the original attribute set. Dimensionality reduction is the mapping of data These include the Combined Approach (CA), the Direct from a high dimensional space to a Approach (DA), the Variance Approach (Var), lower dimension space such that the result obtained by LSA-Transform, the New Top-Down Approach (NTDn), analyzing the reduced dataset is a good approximation the New Bottom-Up Approach (NBUp), the Weighted to the result obtained by analyzing the original data set. Attribute Frequency Approach (WAF) and the Best There are several dimensionality reduction Clustering Performance Approach (BCP) (Nsang approaches which include Random Projections, 2011). Principal Component Analysis, the Variance approach, Dimensionality reduction has several advantages, LSA-Transform, the Combined and Direct approaches, the most important of which is the fact that with and the New Random Approach. dimensionality reduction, we could drastically speed up In this paper, we propose three new techniques, the execution of an algorithm whose runtime depends each of which will be a modified version of the last exponentially on the dimensions of the working space. three techniques mentioned above (the Combined and At the same time, the solution found by working in the Direct approaches, and the New Random Approach). low dimensional space is a good approximation to the We shall implement each of the ten reduction solution in the original high dimensional space. techniques mentioned, after which we shall use these One application of dimensionality reduction is in techniques to compress various pictures. Finally, we the compression of image data. In this domain, digital shall compare the ten reduction techniques images are stored as 2D matrices which represent the implemented in this paper with each other by the extent brightness of each pixel. Usually, the matrix to which they preserve images. representing an image can be quite large, and for this reason it could be very time consuming querying this Index Terms— dimensionality reduction, image matrix to find out any information about the features compression, principal component analysis of the image. In this paper, dimensionality reduction techniques are used to reduce the matrix 1. Introduction representation of an image. This makes it possible to query the reduced matrix to get any information about Given a collection of n data points (vectors) in high the original image. Besides, we can use these dimensional space, it is often helpful to be able to techniques to compress all the pictures we have in a project it into a lower dimensional space without given folder, or website, thus conserving memory. suffering great distortion (NR 2010a). In other words, it The rest of the paper is organized as follows. In is helpful if we can embed a set of n points in Section 2, we shall examine the different d-dimensional space into a k-dimensional space, where dimensionality reduction techniques that shall be used k << d. This operation is known as dimensionality to reduce the images. In Section 3 we shall look at the reduction. effects of reducing images using each of these There are many known methods of dimensionality techniques, and in Section 4 we shall compare the ten reduction. These include Random Projection (RP), reduction techniques implemented in this project with Singular Value Decomposition (SVD), Principal each other by the extent to which they preserve Component Analysis (PCA), Kernel Principal images, and by their speeds of execution. Then we Component Analysis (KPCA), Discrete Cosine shall conclude this paper in Section 5. Transform (DCT) and Latent Semantic Analysis (LSA) (NR 2009). For each of these methods, each attribute in 2. Dimensionality Reduction Techniques the reduced set is a linear combination of the attributes in the original data set. In this section, we shall examine the different reduction techniques that we will be using to reduce the images. They include the following: 2.1 Random Projection variance, i2 the dimension of next larger variance, etc. The reduced data base is obtained by extracting the data In Random Projection, the original d-dimensional corresponding to the selected dimensions. That is, data is projected to a k-dimensional (k << d) subspace project D on Ir to obtain: through the origin, using a random d x k matrix R whose DR = D(:, Ir), columns have unit lengths (Bingham E. 2001). If Xnxd is where DR has the same number of rows as D and r the original set of n d-dimensional observations, then columns: the ith column of DR is the column of the RP X nxk X nxd Rdxk original database with the ith largest variance. is the projection of the data onto a lower k-dimensional 2.4 LSA-Transform (Nsang 2011) subspace. The key idea of random mapping arises from the LSA-Transform is probably the best technique for Johnson Lindenstrauss lemma (Johnson W. B.) which reducing image data. It makes use of the redundancy of states that if points in a vector space are projected onto the data in matrices that represent images, in practice. a randomly selected subspace of suitably high Specifically, if I is an image, and M is the matrix (of dimension, then the distances between the points are pixel brightness values) representing I, LSA-Transform approximately preserved. simply selects only the even columns and rows of M to give M1. The simple explanation for this is as follows: 2.2 Principal Component Analysis (PCA) one point on an image is usually represented by a whole rectangle of values in the corresponding matrix. For Given n data points in P as an n x p matrix X, we want instance, a dark point maybe represented by the values: to find the best q-dimensional approximation for the data (q << p). The PCA approach achieves this by first 93 94 88 93 computing the Singular Value Decomposition of X. In 87 89 89 89 other words, it finds matrices U, D and V such that X = 87 83 88 88 UDVT where: 93 89 88 89 U is an n x n orthogonal matrix (i.e. UTU = In) whose columns are the left singular vectors of X; Each of these values, as we can see, is less than 95. V is a p x p orthogonal matrix (i.e. VTV = Ip) Selecting only the even rows leaves us with: whose columns are the right singular vectors of X; D is an n x p diagonal matrix with diagonal 87 89 89 89 elements d1 ≥ d2 ≥ d3 … ≥ dp ≥ 0 which are the 93 89 88 89 singular values of X. Note that the bottom rows of D are zero rows. Similarly, selecting only the even columns leaves us Define Uq to be the matrix whose columns are unit with: vectors corresponding to the q largest left singular values of X. Uq is a n x q matrix. 89 89 89 89 The transformed matrix is given by (Bingham E. 2001): XSVD = XTUq Thus the original sub-matrix of sixteen cells becomes reduced to a smaller matrix of four cells, which also 2.3 The Variance Approach (NR 2010b) represents one dark point on an image. After the execution of LSA-Transform, therefore, M and M1 With the Variance approach, to reduce a dataset D become two matrices representing I, one a quarter the to a data set DR, we start with an empty set, I, and then size of the other. add dimensions of D to this set in decreasing order of their variances. That means that a set I of r dimensions 2.5 The Combined Approach (NR 2010b) will contain the dimensions of top r variances. Intuitively, it easy to justify why dimensions of low Like the two previous approaches, the Combined variance are left out as they would fail to discriminate Approach is one approach which reduces a dataset D to between the data. (Indeed, in an extreme case where all a subset of the original attribute set. the values along a dimension are equal, the variance is To reduce a dataset Dnxp to a dataset containing k 0, and hence this dimension cannot distinguish between columns, the Combined Approach selects the data points). Thus, let combination of k attributes which best preserve the Ir = {i1, . . . , ir} ⊂ {1, . . . , n}, interpoint distances, and reduces the dataset to a dataset the collection of dimensions corresponding to the top r containing only those k attributes. To do so, it first variances. That is i1 denotes the dimension of largest determines the extent to which each attribute preserves the interpoint distances. In other words, for each compute the average distance preservation for this attribute, x, in D, it computes gxm and gxM given by: combination using the formulas above. gxm = min{ || f (u ) f (v) || 2 } 2.7 The New Random Approach || u v || 2 gxM = max{ || f (u ) f (v) || 2 } This is a technique suggested by Nsang, Maikori, || u v || 2 Oguntoyinbo and Yusuf in (NMOY 2015). With this where u and v are any two rows of D, and f(u) and f(v) technique, to reduce a data set D of dimensionality d to are the corresponding rows in the dataset reduced to the one of dimensionality k, a set Sk is formed consisting of single attribute x. The average distance preservation for k numbers selected at random from the set S given by: the attribute x is then computed as: S = {x ϵ N | 1 x d} Then, our reduced set, DR, will be given by: gxmid = (gxm + gxM)/2 DR = D(:, Sk) That is, DR is a data set having the same number of rows To reduce the dataset D from p columns to k columns, as D, and if Ai is the ith attribute of DR, then Ai is the jth this approach then finds the combination of k attributes attribute of D if j is the ith element of Sk.. whose average value of gxmid is maximum. 2.8 The Modified Combined Approach 2.6 The Direct Approach (NR 2010b) As we saw in Section 2.5 above, the Combined As with the Combined Approach, to reduce a Approach computes the average distance preservation dataset Dnxp to a dataset containing k columns, the of a combination of attributes by computing the average Direct Approach selects the combination of k attributes of their gxmid values. It’s very clear that for any given which best preserve the interpoint distances, and attribute, x, gxmid is only an estimate of its average reduces the original dataset to a dataset containing only distance preservation, since it is computed as the those k attributes. To do so, it first generates all possible midpoint between gxm, the minimum distance combinations of k attributes from the original p preservation, and gxM, the maximum distance attributes. Then, for each combination, C, it computes preservation. gcm and gcM given by: The modified version of the Combined Approach improves on the original version by computing the gcm = min{ || f (u ) f (v) || } 2 average distance preservation of a combination of || u v || 2 attributes as the average of the actual distance gcM = max{ || f (u ) f (v) || } 2 preservations of each attribute. If x is an attribute of a || u v || 2 dataset D, the actual distance preservation of x is where u and v are any two rows of D, and f(u) and f(v) computed as: are the corresponding rows in the dataset reduced to the n n || f (u ) f (v) || 2 attributes in C. The average distance preservation for this combination of attributes is then computed as: || u v || 2 g x u 1 v u 1 nr gcmid = (gcm + gcM)/2 where n is the number of rows of D, u and v are any two To reduce the dataset D from p attributes to k attributes, rows of D, and f(u) and f(v) are the corresponding rows this approach then finds the combination of k attributes in the dataset reduced to the single attribute x. The term whose value of gcmid is maximum. nr in this equation is the number of pairs of rows of D computed as: As we can see, the difference between the Combined n(n 1) . and Direct Approaches is that for the Combined nr n C 2 2 Approach, we first find the average distance Thus, for any combination of attributes C of D, the preservation for each attribute, and then, for any average distance preservation of C is given as: combination of attributes, we compute its average distance preservation by finding the averages of the gx distance preservations of the individual attributes. With gC xC nC the Direct Approach, on the other hand, to find the average distance preservation for any combination of where nC is the number of attributes in C. Therefore, to attributes, C, we reduce the original dataset directly to reduce a dataset D from p columns to k columns, the the dataset containing only the attributes in C, and then modified version of the Combined Approach finds the combination C of k attributes of D whose value of g C Generate the one-dimensional matrix M1 with p is maximum. entries such that M1[p] holds the frequency of the number p in the matrix M 2.9 The Modified Direct Approach Finally, generate the matrix Result which contains the k entries in M of highest frequency, arranged in Like the Direct Approach, to reduce a dataset Dnxp ascending order to a dataset containing k columns, the modified version of the Direct Approach selects the combination of k Thus, if D is the original dataset, the result of reducing attributes which best preserve the interpoint distances, D using the modified version of the New Random and reduces the original dataset to a dataset containing Approach is given by: only those k attributes. To do so, it first generates all DR = D(:, Result) possible combinations of k attributes from the original p Below is the result of a sample run of the program with attributes. However, for each combination, C, instead D as given in Table 1 below (and with m = 4): of estimating its average distance preservation using its gcmid value, it computes the actual average distance i) After the first run of NRA: preservation of C using the following formula: M=[9 2 3 7 5 1 10] ii) After the second run of NRA: n n || f (u ) f (v) || 2 || u v || 2 g C u 1 v u 1 9 2 3 7 5 1 10 nr M 6 4 5 3 2 8 7 where n is the number of rows of D, u and v are any two rows of D, and f(u) and f(v) are the corresponding rows in the dataset D reduced to the attributes of C. Once again, the term nr in this equation is the number of pairs of rows of D computed as: n(n 1) . nr n C 2 2 Therefore, to reduce a dataset D from p columns to k columns, the modified version of the Direct Approach finds the combination C of k attributes of D whose value of g C is maximum. 2.10 The Modified New Random Approach Table 1: The D Dataset This technique is suggested as an improvement of the New Random Approach discussed in Section 2.7 iii) After the third run of NRA: above. To reduce a dataset Dnxp from p attributes to k attributes using the modified version of the New 9 2 3 7 5 1 10 Random Approach, we use the algorithm below. M 6 4 5 3 2 8 7 Clearly, the idea here is to generate a result which is 1 2 9 4 8 5 3 less random (and thus more efficient) than the result of the New Random Approach. Note that m in the iv) After the fourth run of NRA algorithm is the number of times the execution of the New Random Approach is repeated. 9 2 3 7 5 1 10 6 4 5 3 2 8 7 Algorithm M 1 2 9 8 4 5 3 2 5 6 8 1 7 10 M = [] for i = 1 to m do Thus: M1 = [3 4 3 2 4 2 3 3 2 2] and Run the New Random Approach to generate k numbers at random in the range 1..p Result = [1 2 3 4 5 7 8] Store the list of numbers generated as the ith row Thus the result of reducing the dataset D using the of M modified version of the New Random Approach is the end dataset DR is given in Table 2 below. 3.2 Principal Component Analysis (PCA) Like RP, PCA is useless in preserving images. The following is the result obtained when we tried to reduce the result obtained using PCA: Original Image: Table 2: The DR Dataset 3.0 Reducing Images Using These Techniques In this section, we shall use each of the techniques examined in Section 2 above to reduce images, and the effects of each reduction will be presented. To achieve this aim, we shall make use of the MATLAB functions imread which converts an image Reduced Image: into a matrix, and imshow which converts a matrix representing pixel brightness values into the image. (In other words, the function of imread is the reverse of the function of imshow.) 3.1 Random Projection Random Projection is useless in preserving images. The simple reason is that when we multiply the matrix 3.3 Variance representing an image by a random matrix R, the resulting matrix typically has values outside the range Apart from RP and PCA, all the other methods we of pixel brightness values. In our experiment, this is the implemented were reasonably efficient in preserving result we obtained: images. With the Variance method, the results obtained are displayed below: Original Image: Original Image: Reduced Image: Reduced Image: 3.4 Combined Approach 3.6 The Modified Combined Approach Original Image: Original Image: Reduced Image: Reduced Image: 3.5 Direct Approach 3.7 The Modified Direct Approach Original Image: Original Image: Reduced Image: Reduced Image: 3.8 LSA-Transform 3.10 New Random Approach (Modified Version) Original Image: Original Image: Reduced Image: Reduced Image: 3.9 The New Random Approach Remark: Original Image: As can be observed from the results above, some reduction methods (such as the Combined and Direct Approaches and their modified versions) maintain the sizes of the original image while others do not. 4.0 Comparisons Between the Different Dimensionality Reduction Techniques As mentioned above, there are two types of reduction techniques: those in which each attribute of the reduced set is a linear combination of the attributes Reduced Image: in the original data set; and those which reduce a dataset to a subset of the original attribute set. Of the ten techniques implemented in this paper, two of them (RP and PCA) belong to the first category, and as we have seen, they are both useless in preserving images. This applies to almost every technique in this category. Two exceptions in this regard include Two Dimensional PCA and Discrete Cosine Transform (Nsang 2011). All the other eight techniques implemented in this paper belong to the second category, and as we have seen, they are all efficient in preserving images. Of these eight techniques, as mentioned above, complexities and can only be used to reduce small LSA-Transform is probably the best in preserving images. images. Apart from the fact that the quality of the reduced image is practically the same as the quality of We shall compare the extents to which each method the original image, its speed of execution is very high. discussed in this paper preserves the interpoint All the other seven techniques also significantly distances and k-means clustering of different datasets in maintain the quality of the original image especially another work. We shall also analyze the time when most of the attributes of the matrix representing complexity of each of the ten techniques implemented the original image are maintained – for instance when in this paper. We could not do these in this paper due to the number of attributes of the matrix representing the time and space constraints. reduced image is at least 90% of the number of attributes of the matrix representing the original image. References However, if we reduce the matrix representing the original image to 60% say (as in this case in this paper), Nsang A., Ralescu A. 2010, Approaches to as we can see, some of these methods are more efficient Dimensionality Reduction to a Subset of the Original than others in preserving the original image. From the Dimensions. In Proceedings of the Twenty-First best to worst (as we can see from the results above), we Midwest Artificial Intelligence and Cognitive Science have the New Random Approach, the Modified New Conference (MAICS 2010), South Bend, IN., 70 - 77. Random Approach, and the Variance Approach followed by the Direct and Combined Approaches and Nsang A., Ralescu A. 2009, A Review of their modified versions. Interestingly, these last four Dimensionality Reduction and Their Applications. In approaches are also the least time efficient. As a matter Proceedings of the Twentieth Midwest Artificial of fact, these last four approaches could take many days Intelligence and Cognitive Science Conference to run! (MAICS 2009), Fort Wayne, IN., 118 - 123. Because the Combined and Direct approaches and their modified versions have high run-time Nsang, A. 2011. Novel Approaches to Dimensionality complexities, they are only suitable for reducing small Reduction and Applications, An Empirical Study. images. The Variance approach on the other hand has Lambert Academic Publishing. the lowest run-time complexity (apart from LSA-Transform, of course), which makes it the most Bingham E., Manilla H. 2001, Random Projections in suitable for reducing large images. Obviously, the New Dimensionality Reduction: Applications to Image and Random Approach and its modified version are also Text Data. In Conference on Knowledge Discovery suitable for reducing large images. Discovery in Data, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 245-250. 5.0 Conclusion and Future Work Johnson W. B., Lindenstrauss J., 1984. Extensions of In this paper, we have studied two categories of Lipshitz mapping into Hilbert Space. Contemporary dimensionality reduction techniques: those in which Mathematics. each attribute in the reduced set is a linear combination of the attributes in the original set, and those which Nsang A., Ralescu A. 2010. More Dimensionality reduce a data set to a proper subset of the original Reduction to a Subset of the Original Attribute Set. In attribute set. As we have realized, while most of the Proceedings of the Twenty-First Midwest Artificial techniques in the first category are useless in preserving Intelligence and Cognitive Science Conference images, every technique in the second category can be (MAICS 2010), South Bend, IN., 109-116. used to preserve images. Image preservation is a very important application of dimensionality reduction as it Nsang A., Maikori A., Oguntoyinbo F., Yusuf H. 2015. enables us to conserve memory and to improve the A New Random Approach to Dimensionality speed of execution of any programs which use these Reduction, Unpublished. images. We also noticed that while LSA-Transform, the New Random Approach, the Modified New Random Approach, and the Variance Approach have low run-time complexities and can be used to reduce large images, the Direct and Combined Approaches and their modified versions have very large run-time