=Paper= {{Paper |id=Vol-1353/paper_04 |storemode=property |title=Image Reduction Using Assorted Dimensionality Reduction Techniques |pdfUrl=https://ceur-ws.org/Vol-1353/paper_04.pdf |volume=Vol-1353 |dblpUrl=https://dblp.org/rec/conf/maics/NsangBS15 }} ==Image Reduction Using Assorted Dimensionality Reduction Techniques== https://ceur-ws.org/Vol-1353/paper_04.pdf
       Image Reduction Using Assorted Dimensionality Reduction Techniques
     Augustine S. Nsang                         Abdullahi Musa Bello                    Hammed Shamsudeen
  Department of Computer Science           Department of Computer Science       Department of Computer Science
School of Inf Tech and Computing          School of Inf Tech and Computing      School of Inf Tech and Computing
   American University of Nigeria         American University of Nigeria          American University of Nigeria
Yola By-Pass, PMB 2250, Yola, Nigeria     Yola By-Pass, PMB 2250, Yola, Nigeria Yola By-Pass, PMB 2250, Yola, Nigeria
    augustine.nsang@aun.edu.ng                abdullahi.bello@aun.edu.ng         shamsudeen.hammed@aun.edu.ng
 

                         Abstract                             Other dimensionality reduction methods, however,
                                                              reduce a dataset to a subset of the original attribute set.
      Dimensionality reduction is the mapping of data         These include the Combined Approach (CA), the Direct
 from      a    high     dimensional     space     to    a    Approach (DA), the Variance Approach (Var),
 lower dimension space such that the result obtained by       LSA-Transform, the New Top-Down Approach (NTDn),
 analyzing the reduced dataset is a good approximation        the New Bottom-Up Approach (NBUp), the Weighted
 to the result obtained by analyzing the original data set.   Attribute Frequency Approach (WAF) and the Best
      There are several dimensionality reduction              Clustering Performance Approach (BCP) (Nsang
 approaches which include Random Projections,                 2011).
 Principal Component Analysis, the Variance approach,              Dimensionality reduction has several advantages,
 LSA-Transform, the Combined and Direct approaches,           the most important of which is the fact that with
 and the New Random Approach.                                 dimensionality reduction, we could drastically speed up
      In this paper, we propose three new techniques,         the execution of an algorithm whose runtime depends
 each of which will be a modified version of the last         exponentially on the dimensions of the working space.
 three techniques mentioned above (the Combined and           At the same time, the solution found by working in the
 Direct approaches, and the New Random Approach).             low dimensional space is a good approximation to the
 We shall implement each of the ten reduction                 solution in the original high dimensional space.
 techniques mentioned, after which we shall use these               One application of dimensionality reduction is in
 techniques to compress various pictures. Finally, we          the compression of image data. In this domain, digital
 shall compare the ten reduction techniques                    images are stored as 2D matrices which represent the
 implemented in this paper with each other by the extent       brightness of each pixel. Usually, the matrix
 to which they preserve images.                                representing an image can be quite large, and for this
                                                               reason it could be very time consuming querying this
   Index Terms— dimensionality reduction, image                matrix to find out any information about the features
 compression, principal component analysis                     of the image. In this paper, dimensionality reduction
                                                               techniques are used to reduce the matrix
 1. Introduction                                               representation of an image. This makes it possible to
                                                               query the reduced matrix to get any information about
      Given a collection of n data points (vectors) in high    the original image. Besides, we can use these
 dimensional space, it is often helpful to be able to          techniques to compress all the pictures we have in a
 project it into a lower dimensional space without             given folder, or website, thus conserving memory.
 suffering great distortion (NR 2010a). In other words, it          The rest of the paper is organized as follows. In
 is helpful if we can embed a set of n points in               Section 2, we shall examine the different
 d-dimensional space into a k-dimensional space, where         dimensionality reduction techniques that shall be used
 k << d. This operation is known as dimensionality             to reduce the images. In Section 3 we shall look at the
 reduction.                                                    effects of reducing images using each of these
      There are many known methods of dimensionality           techniques, and in Section 4 we shall compare the ten
 reduction. These include Random Projection (RP),              reduction techniques implemented in this project with
 Singular Value Decomposition (SVD), Principal                 each other by the extent to which they preserve
 Component Analysis (PCA), Kernel Principal                    images, and by their speeds of execution. Then we
 Component Analysis (KPCA), Discrete Cosine                    shall conclude this paper in Section 5.
 Transform (DCT) and Latent Semantic Analysis (LSA)
 (NR 2009). For each of these methods, each attribute in      2. Dimensionality Reduction Techniques
 the reduced set is a linear combination of the attributes
 in the original data set.                                        In this section, we shall examine the different
                                                              reduction techniques that we will be using to reduce the
                                                              images. They include the following:
2.1 Random Projection                                        variance, i2 the dimension of next larger variance, etc.
                                                             The reduced data base is obtained by extracting the data
     In Random Projection, the original d-dimensional        corresponding to the selected dimensions. That is,
data is projected to a k-dimensional (k << d) subspace       project D on Ir to obtain:
through the origin, using a random d x k matrix R whose           DR = D(:, Ir),
columns have unit lengths (Bingham E. 2001). If Xnxd is      where DR has the same number of rows as D and r
the original set of n d-dimensional observations, then       columns: the ith column of DR is the column of the
        RP
      X nxk  X nxd Rdxk                                     original database with the ith largest variance.

is the projection of the data onto a lower k-dimensional     2.4 LSA-Transform (Nsang 2011)
subspace.
     The key idea of random mapping arises from the               LSA-Transform is probably the best technique for
Johnson Lindenstrauss lemma (Johnson W. B.) which            reducing image data. It makes use of the redundancy of
states that if points in a vector space are projected onto   the data in matrices that represent images, in practice.
a randomly selected subspace of suitably high                Specifically, if I is an image, and M is the matrix (of
dimension, then the distances between the points are         pixel brightness values) representing I, LSA-Transform
approximately preserved.                                     simply selects only the even columns and rows of M to
                                                             give M1. The simple explanation for this is as follows:
2.2 Principal Component Analysis (PCA)                       one point on an image is usually represented by a whole
                                                             rectangle of values in the corresponding matrix. For
Given n data points in P as an n x p matrix X, we want      instance, a dark point maybe represented by the values:
to find the best q-dimensional approximation for the
data (q << p). The PCA approach achieves this by first              93 94 88 93
computing the Singular Value Decomposition of X. In                 87 89 89 89
other words, it finds matrices U, D and V such that X =             87 83 88 88
UDVT where:                                                         93 89 88 89
    U is an n x n orthogonal matrix (i.e. UTU = In)
     whose columns are the left singular vectors of X;       Each of these values, as we can see, is less than 95.
    V is a p x p orthogonal matrix (i.e. VTV = Ip)          Selecting only the even rows leaves us with:
     whose columns are the right singular vectors of X;
    D is an n x p diagonal matrix with diagonal                    87 89 89 89
     elements d1 ≥ d2 ≥ d3 … ≥ dp ≥ 0 which are the                 93 89 88 89
     singular values of X. Note that the bottom rows of
     D are zero rows.                                        Similarly, selecting only the even columns leaves us
    Define Uq to be the matrix whose columns are unit       with:
     vectors corresponding to the q largest left singular
     values of X. Uq is a n x q matrix.                             89 89
                                                                    89 89
The transformed matrix is given by (Bingham E. 2001):
    XSVD = XTUq                                              Thus the original sub-matrix of sixteen cells becomes
                                                             reduced to a smaller matrix of four cells, which also
2.3 The Variance Approach (NR 2010b)                         represents one dark point on an image. After the
                                                             execution of LSA-Transform, therefore, M and M1
     With the Variance approach, to reduce a dataset D       become two matrices representing I, one a quarter the
to a data set DR, we start with an empty set, I, and then    size of the other.
add dimensions of D to this set in decreasing order of
their variances. That means that a set I of r dimensions     2.5 The Combined Approach (NR 2010b)
will contain the dimensions of top r variances.
Intuitively, it easy to justify why dimensions of low             Like the two previous approaches, the Combined
variance are left out as they would fail to discriminate     Approach is one approach which reduces a dataset D to
between the data. (Indeed, in an extreme case where all      a subset of the original attribute set.
the values along a dimension are equal, the variance is           To reduce a dataset Dnxp to a dataset containing k
0, and hence this dimension cannot distinguish between       columns, the Combined Approach selects the
data points). Thus, let                                      combination of k attributes which best preserve the
     Ir = {i1, . . . , ir} ⊂ {1, . . . , n},                 interpoint distances, and reduces the dataset to a dataset
the collection of dimensions corresponding to the top r      containing only those k attributes. To do so, it first
variances. That is i1 denotes the dimension of largest       determines the extent to which each attribute preserves
the interpoint distances. In other words, for each              compute the average distance preservation for this
attribute, x, in D, it computes gxm and gxM given by:           combination using the formulas above.

      gxm = min{ || f (u )  f (v) || 2 }                       2.7 The New Random Approach
                        || u  v || 2
       gxM = max{ || f (u )  f (v) || 2 }                           This is a technique suggested by Nsang, Maikori,
                         || u  v || 2                          Oguntoyinbo and Yusuf in (NMOY 2015). With this
where u and v are any two rows of D, and f(u) and f(v)          technique, to reduce a data set D of dimensionality d to
are the corresponding rows in the dataset reduced to the        one of dimensionality k, a set Sk is formed consisting of
single attribute x. The average distance preservation for       k numbers selected at random from the set S given by:
the attribute x is then computed as:                                 S = {x ϵ N | 1  x  d}
                                                                Then, our reduced set, DR, will be given by:
       gxmid = (gxm + gxM)/2                                         DR = D(:, Sk)
                                                                That is, DR is a data set having the same number of rows
To reduce the dataset D from p columns to k columns,            as D, and if Ai is the ith attribute of DR, then Ai is the jth
this approach then finds the combination of k attributes        attribute of D if j is the ith element of Sk..
whose average value of gxmid is maximum.
                                                                2.8 The Modified Combined Approach
2.6 The Direct Approach (NR 2010b)
                                                                      As we saw in Section 2.5 above, the Combined
     As with the Combined Approach, to reduce a                 Approach computes the average distance preservation
dataset Dnxp to a dataset containing k columns, the             of a combination of attributes by computing the average
Direct Approach selects the combination of k attributes         of their gxmid values. It’s very clear that for any given
which best preserve the interpoint distances, and               attribute, x, gxmid is only an estimate of its average
reduces the original dataset to a dataset containing only       distance preservation, since it is computed as the
those k attributes. To do so, it first generates all possible   midpoint between gxm, the minimum distance
combinations of k attributes from the original p                preservation, and gxM, the maximum distance
attributes. Then, for each combination, C, it computes          preservation.
gcm and gcM given by:                                                 The modified version of the Combined Approach
                                                                improves on the original version by computing the
        gcm = min{ || f (u )  f (v) || }
                                            2
                                                                average distance preservation of a combination of
                         || u  v ||    2
                                                                attributes as the average of the actual distance
        gcM = max{ || f (u )  f (v) || }
                                            2                   preservations of each attribute. If x is an attribute of a
                          || u  v ||   2                       dataset D, the actual distance preservation of x is
where u and v are any two rows of D, and f(u) and f(v)          computed as:
are the corresponding rows in the dataset reduced to the
                                                                               n    n
                                                                                         || f (u )  f (v) || 2
attributes in C. The average distance preservation for
this combination of attributes is then computed as:
                                                                                            || u  v || 2
                                                                      g x  u 1 v u 1
                                                                                              nr
      gcmid = (gcm + gcM)/2
                                                                where n is the number of rows of D, u and v are any two
To reduce the dataset D from p attributes to k attributes,      rows of D, and f(u) and f(v) are the corresponding rows
this approach then finds the combination of k attributes        in the dataset reduced to the single attribute x. The term
whose value of gcmid is maximum.                                nr in this equation is the number of pairs of rows of D
                                                                computed as:
As we can see, the difference between the Combined                                       n(n  1) .
and Direct Approaches is that for the Combined                           nr  n C 2 
                                                                                            2
Approach, we first find the average distance                    Thus, for any combination of attributes C of D, the
preservation for each attribute, and then, for any              average distance preservation of C is given as:
combination of attributes, we compute its average
distance preservation by finding the averages of the                              gx
distance preservations of the individual attributes. With               gC    xC

                                                                                 nC
the Direct Approach, on the other hand, to find the
average distance preservation for any combination of            where nC is the number of attributes in C. Therefore, to
attributes, C, we reduce the original dataset directly to       reduce a dataset D from p columns to k columns, the
the dataset containing only the attributes in C, and then       modified version of the Combined Approach finds the
combination C of k attributes of D whose value of g C          Generate the one-dimensional matrix M1 with p
is maximum.                                                      entries such that M1[p] holds the frequency of the
                                                                 number p in the matrix M
2.9 The Modified Direct Approach                               Finally, generate the matrix Result which contains
                                                                 the k entries in M of highest frequency, arranged in
      Like the Direct Approach, to reduce a dataset Dnxp         ascending order
to a dataset containing k columns, the modified version
of the Direct Approach selects the combination of k          Thus, if D is the original dataset, the result of reducing
attributes which best preserve the interpoint distances,     D using the modified version of the New Random
and reduces the original dataset to a dataset containing     Approach is given by:
only those k attributes. To do so, it first generates all        DR = D(:, Result)
possible combinations of k attributes from the original p    Below is the result of a sample run of the program with
attributes. However, for each combination, C, instead        D as given in Table 1 below (and with m = 4):
of estimating its average distance preservation using its
gcmid value, it computes the actual average distance         i) After the first run of NRA:
preservation of C using the following formula:                    M=[9 2 3 7 5                  1   10]

                                                             ii) After the second run of NRA:
                  n    n
                            || f (u )  f (v) || 2
                               || u  v || 2
         g C  u 1 v u 1                                          9 2 3 7 5 1 10
                                 nr                               M               
                                                                     6 4 5 3 2 8 7 
where n is the number of rows of D, u and v are any two
rows of D, and f(u) and f(v) are the corresponding rows
in the dataset D reduced to the attributes of C. Once
again, the term nr in this equation is the number of pairs
of rows of D computed as:
                       n(n  1) .
        nr  n C 2 
                          2
Therefore, to reduce a dataset D from p columns to k
columns, the modified version of the Direct Approach
finds the combination C of k attributes of D whose
value of g C is maximum.

2.10 The Modified New Random Approach
                                                                             Table 1: The D Dataset
     This technique is suggested as an improvement of
the New Random Approach discussed in Section 2.7             iii) After the third run of NRA:
above. To reduce a dataset Dnxp from p attributes to k
attributes using the modified version of the New                       9 2 3 7 5 1 10
Random Approach, we use the algorithm below.                      M  6 4 5 3 2 8 7 
Clearly, the idea here is to generate a result which is                1 2 9 4 8 5 3 
less random (and thus more efficient) than the result of
the New Random Approach. Note that m in the                  iv) After the fourth run of NRA
algorithm is the number of times the execution of the
New Random Approach is repeated.                                     9 2 3 7 5 1 10 
                                                                     6 4 5 3 2 8 7 
Algorithm                                                         M                
                                                                     1 2 9 8 4 5 3 
                                                                                    
                                                                     2 5 6 8 1 7 10
  M = []
  for i = 1 to m do
                                                             Thus: M1 = [3 4 3 2 4 2 3 3 2 2] and
      Run the New Random Approach to generate k
        numbers at random in the range 1..p
                                                                    Result = [1 2 3 4 5 7 8]
      Store the list of numbers generated as the ith row
                                                             Thus the result of reducing the dataset D using the
     of M
                                                             modified version of the New Random Approach is the
  end
                                                             dataset DR is given in Table 2 below.
                                                             3.2 Principal Component Analysis (PCA)

                                                             Like RP, PCA is useless in preserving images. The
                                                             following is the result obtained when we tried to reduce
                                                             the result obtained using PCA:

                                                             Original Image:




                Table 2: The DR Dataset

3.0 Reducing Images Using These Techniques

     In this section, we shall use each of the techniques
examined in Section 2 above to reduce images, and the
effects of each reduction will be presented.
     To achieve this aim, we shall make use of the
MATLAB functions imread which converts an image              Reduced Image:
into a matrix, and imshow which converts a matrix
representing pixel brightness values into the image. (In
other words, the function of imread is the reverse of the
function of imshow.)

3.1 Random Projection

Random Projection is useless in preserving images. The
simple reason is that when we multiply the matrix            3.3 Variance
representing an image by a random matrix R, the
resulting matrix typically has values outside the range      Apart from RP and PCA, all the other methods we
of pixel brightness values. In our experiment, this is the   implemented were reasonably efficient in preserving
result we obtained:                                          images. With the Variance method, the results obtained
                                                             are displayed below:
Original Image:
                                                             Original Image:




Reduced Image:
                                                             Reduced Image:
3.4 Combined Approach   3.6 The Modified Combined Approach

Original Image:         Original Image:




Reduced Image:          Reduced Image:




3.5 Direct Approach     3.7 The Modified Direct Approach

Original Image:         Original Image:




Reduced Image:          Reduced Image:
3.8 LSA-Transform             3.10 New Random Approach (Modified Version)

Original Image:               Original Image:




Reduced Image:                Reduced Image:




3.9 The New Random Approach   Remark:

Original Image:                  As can be observed from the results above,
                               some reduction methods (such as the Combined
                               and Direct Approaches and their modified
                               versions) maintain the sizes of the original image
                               while others do not.


                              4.0 Comparisons Between the Different
                                  Dimensionality Reduction Techniques

                                 As mentioned above, there are two types of
                              reduction techniques: those in which each attribute of
                              the reduced set is a linear combination of the attributes
Reduced Image:                in the original data set; and those which reduce a
                              dataset to a subset of the original attribute set. Of the
                              ten techniques implemented in this paper, two of them
                              (RP and PCA) belong to the first category, and as we
                              have seen, they are both useless in preserving images.
                              This applies to almost every technique in this category.
                              Two exceptions in this regard include Two
                              Dimensional PCA and Discrete Cosine Transform
                              (Nsang 2011). All the other eight techniques
                              implemented in this paper belong to the second
                              category, and as we have seen, they are all efficient in
                              preserving images.
     Of these eight techniques, as mentioned above,          complexities and can only be used to reduce small
LSA-Transform is probably the best in preserving             images.
images. Apart from the fact that the quality of the
reduced image is practically the same as the quality of      We shall compare the extents to which each method
the original image, its speed of execution is very high.     discussed in this paper preserves the interpoint
All the other seven techniques also significantly            distances and k-means clustering of different datasets in
maintain the quality of the original image especially        another work. We shall also analyze the time
when most of the attributes of the matrix representing       complexity of each of the ten techniques implemented
the original image are maintained – for instance when        in this paper. We could not do these in this paper due to
the number of attributes of the matrix representing the      time and space constraints.
reduced image is at least 90% of the number of
attributes of the matrix representing the original image.    References
However, if we reduce the matrix representing the
original image to 60% say (as in this case in this paper),   Nsang A., Ralescu A. 2010, Approaches to
as we can see, some of these methods are more efficient      Dimensionality Reduction to a Subset of the Original
than others in preserving the original image. From the       Dimensions. In Proceedings of the Twenty-First
best to worst (as we can see from the results above), we     Midwest Artificial Intelligence and Cognitive Science
have the New Random Approach, the Modified New               Conference (MAICS 2010), South Bend, IN., 70 - 77.
Random Approach, and the Variance Approach
followed by the Direct and Combined Approaches and           Nsang A., Ralescu A. 2009, A Review of
their modified versions. Interestingly, these last four      Dimensionality Reduction and Their Applications. In
approaches are also the least time efficient. As a matter    Proceedings of the Twentieth Midwest Artificial
of fact, these last four approaches could take many days     Intelligence and Cognitive Science Conference
to run!                                                      (MAICS 2009), Fort Wayne, IN., 118 - 123.
     Because the Combined and Direct approaches and
their modified versions have high run-time                   Nsang, A. 2011. Novel Approaches to Dimensionality
complexities, they are only suitable for reducing small      Reduction and Applications, An Empirical Study.
images. The Variance approach on the other hand has          Lambert Academic Publishing.
the lowest run-time complexity (apart from
LSA-Transform, of course), which makes it the most           Bingham E., Manilla H. 2001, Random Projections in
suitable for reducing large images. Obviously, the New       Dimensionality Reduction: Applications to Image and
Random Approach and its modified version are also            Text Data. In Conference on Knowledge Discovery
suitable for reducing large images.                          Discovery in Data, Proceedings of the Seventh ACM
                                                             SIGKDD International Conference on Knowledge
                                                             Discovery and Data Mining, 245-250.
5.0 Conclusion and Future Work
                                                             Johnson W. B., Lindenstrauss J., 1984. Extensions of
In this paper, we have studied two categories of             Lipshitz mapping into Hilbert Space. Contemporary
dimensionality reduction techniques: those in which          Mathematics.
each attribute in the reduced set is a linear combination
of the attributes in the original set, and those which       Nsang A., Ralescu A. 2010. More Dimensionality
reduce a data set to a proper subset of the original         Reduction to a Subset of the Original Attribute Set. In
attribute set. As we have realized, while most of the        Proceedings of the Twenty-First Midwest Artificial
techniques in the first category are useless in preserving   Intelligence and Cognitive Science Conference
images, every technique in the second category can be        (MAICS 2010), South Bend, IN., 109-116.
used to preserve images. Image preservation is a very
important application of dimensionality reduction as it      Nsang A., Maikori A., Oguntoyinbo F., Yusuf H. 2015.
enables us to conserve memory and to improve the             A New Random Approach to Dimensionality
speed of execution of any programs which use these           Reduction, Unpublished.
images.

We also noticed that while LSA-Transform, the New
Random Approach, the Modified New Random
Approach, and the Variance Approach have low
run-time complexities and can be used to reduce large
images, the Direct and Combined Approaches and their
modified versions have very large run-time