Tensor Tensor Decomposition for Decomposition for 3D 3DBars BarsProblem Problem Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Department of Computer Science, FEECS, VŠB – Technical University of Ostrava, Department of Computer Science, FEECS, VŠB – Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic 17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic {jan.platos, jana.kocibova.st1, pavel.kromer.fei, pavel.moravec, vaclav.snasel}@vsb.cz Abstract. In this paper, we compare performance of several dimension reduction techniques, namely SVD, NMF and SDD.The qualitative comparison is evaluated on a collection of bars. We compare the quality of these methods from on the base of the visual impact. We also compare dimension reduction techniques SVD and HO-SVD on tensors - 3D bars. 1 Introduction In order to perform object recognition (no mater which one) it is necessary to learn rep- resentations of the underlying characteristic components. Such components correspond to object-parts, or features. These components can occur in different configurations to form many distinct images. Identifying the underlying components which are combined to form images is thus essential for learning the perceptual representations necessary for performing object recognition. The application area of feature extraction on binary datasets addresses many prob- lem areas, such as association rule mining, itemsets used for market basket analysis, discovery of regulation patterns in DNA microarray experiments, etc. For simplicity sake we used the well-known bars problem (see e.g. [2]), where we try to isolate sepa- rate horizontal and vertical bars from images containing their combinations. In this paper we will concentrate on the last category – other feature extraction methods which use known dimension reduction techniques and clustering for automatic feature extraction. In this paper we will use the bars collection as a benchmark collection. The bars problem (and its variations) is a benchmark task for the learning of independent image features (Föildiák [2]; Spratling [6];). In the standard version of the bars problem, as defined by Föildiák [2], training data consists of 8 by 8 pixel images in which each of the 16 possible (one-pixel wide) horizontal and vertical bars can be present with a probability of 81 . Typical examples of training images are shown in Figure 1. One of the well-known methods of feature extraction is the singular value decom- position (SVD) which was already successfully used for automatic feature extraction. We extended the bars problem to 3 dimensions, using planes instead of lines. The input cube contains several planes, which may or may not be parallel to x, y and z axes. The straightforward approach to image indexing is to transform the 2D images into a single vector. This is often done by concatenating all the rows of an image into a single image vector [7] (although a more sophisticated method can be used). We will use similar approach for 3D bars and classic SVD, combining two dimensions into one, V. Snášel, K. Richta, J. Pokorný (Eds.): Dateso 2008, pp. 49–60, ISBN 978-80-248-1746-0. 50 Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel so that we can compare the original and reconstructed matrices based on the visual impact and Frobenius norm. The rest of this paper is organized as follows. The second section explains dimen- sion reduction methods, which were used for classic 2D bars problem. The third section mentions CubeSVD, which was originaly used for the 3D bars problem. Then in the fourth section we describe experimental results and finally in the section five we made some conclusions. 2 Dimension Reduction We used four promising methods of dimension reduction for our comparison – Singular Value Decomposition (SVD), Semi-Discrete Decomposition (SDD) and Non-negative Matrix Factorization (NMF). All of them are briefly described bellow. 2.1 Singular Value Decomposition SVD [1] is an algebraic extension of classical vector model. It is similar to the PCA method, which was originally used for the generation of eigenfaces in image retrieval. Informally, SVD discovers significant properties and represents the images as linear combinations of the base vectors. Moreover, the base vectors are ordered according to their significance for the reconstructed image, which allows us to consider only the first k base vectors as important (the remaining ones are interpreted as ”noise” and discarded). Furthermore, SVD is often referred to as more successful in recall when compared to querying whole image vectors [1]. Formally, we decompose the matrix of images A by singular value decomposition (SVD), calculating singular values and singular vectors of A. We have matrix A, which is an n × m rank-r matrix (where m ≥ n without loss T of generality)√ and values σ1 , . . . , σr are calculated from eigenvalues of matrix AA as σi = λi . Based on them, we can calculate column-orthonormal matrices U = (u1 , . . . , un ) and V = (v1 , . . . , vn ), where U T U = In a V T V = Im , and a diagonal matrix Σ = diag(σ1 , . . . , σn ), where σi > 0 for i ≤ r, σi ≥ σi+1 and σr+1 = · · · = σn = 0. The decomposition A = U ΣV T (1) is called singular decomposition of matrix A and the numbers σ1 , . . . , σr are singular values of the matrix A. Columns of U (or V ) are called left (or right) singular vectors of matrix A. Now we have a decomposition of the original matrix of images A. We get r nonzero singular numbers, where r is the rank of the original matrix A. Because the singular values usually fall quickly, we can take only k greatest singular values with the cor- responding singular vector coordinates and create a k-reduced singular decomposition of A. Let us have k (0 < k < r) and singular value decomposition of A   T Σk 0 Vk A = U ΣV T ≈ Ak = (Uk U0 ) (2) 0 Σ0 V0T Tensor Decomposition for 3D Bars Problem 51 We call Ak = Uk Σk VkT a k-reduced singular value decomposition (rank-k SVD). Instead of the Ak matrix, a matrix of image vectors in reduced space Dk = Σk VkT is used in SVD as the representation of image collection. The image vectors (columns in Dk ) are now represented as points in k-dimensional space (the feature-space). represent the matrices Uk , Σk , VkT . 2.2 Semi-discrete Decomposition The SDD is one of other LSI methods, proposed recently for text retrieval in [3]. As mentioned earlier, the rank-k SVD method (called truncated SVD by authors of semi- discrete decomposition) produces dense matrices U and V , so the resulting required storage may be even larger than the one needed by the original term-by-document ma- trix A. To improve the required storage size and query time, the semi-discrete decomposi- tion was defined as A ≈ Ak = Xk Dk YkT , (3) where each coordinate of the matrices Xk and Yk is constrained to have entries from the set ϕ = {−1, 0, 1}, and the matrix Dk is a diagonal matrix with positive coordinates. The SDD does not reproduce A exactly, even if k = n, but it uses very little storage with respect to the observed accuracy of the approximation. A rank-k SDD (although from mathematical standpoint it is a sum on rank-1 matrices) requires the storage of k(m + n) values from the set {−1, 0, 1} and k scalars. The scalars need to be only single precision because the algorithm is self-correcting. The SDD approximation is formed iteratively. The optimal choice of the triplets (xi , di , yi ) for given k can be determined using greedy algorithm, based on the residual Rk = A − Ak−1 (where A0 is a zero matrix). 2.3 Non-negative Matrix Factorization The NMF [5] method calculates an approximation of the matrix A as a product of two matrices, W and H. The matrices are usually pre-filled with random values (or H is initialized to zero and W is randomly generated). During the calculation the values in W and H stay positive. The approximation of matrix A, matrix Ak , can be calculated as Ak = W H. The original NMF method tries to minimize the Frobenius norm of the difference between A and A0k using min ||V − W H||2F as the criterion in the minimization prob- W,H lem. Recently, a new method was proposed in [6], where the constrained least squares problem min{||Vj − W Hj ||2 − λ||Hj ||22 } is the criterion in the minimization Hj problem. This approach is yields better results for sparse matrices. Unlike in SVD, the base vectors are not ordered from the most general one and we have to calculate the decomposition for each value of k separately. 52 Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Fig. 1. Some more complex 2D bars 3 3D Theory A tensor is a higher order generalization of a vector. Vector is a first order tensor and a matrix is a second order tensor. The order of a tensor A ∈ RI1 ×I2 ×···×IN is N. Elements of A is denoted as ai1 ...in ...iN where 1 ≤ in ≤ IN . Two basic operations are for calculation of CubeSVD: the unfolding of  a tensor A A(n) and the mode-n product of a tensor A and matrix M A ×(n) M . The operation unfolding unfolds the tensor A into matrix A(n) along order N. Each column of tensor A(n) is composed of ai1 ...in ...iN where in varies and the order indices are fixed. The operations unfolding are illustrated in Figure 2 for third order tensor. See [4] for details on operation unfolding of a tensor A. The n-mode product of a tensor A ∈ RI1 ×I2 ×···×IN by a matrix M ∈ RJn ×In is an I1 × I2 × · · · × In−1 × Jn × In+1 × · · · × IN -tensor of which the entries are given by X (A ×n M )i1 ···in−1 jn in+1 ···iN = ai1 ···in−1 in in+1 ···iN mjn in in See [4] for details on operation mode-n product of a tensor A and matrix M. Matrix SVD can be rewritten as A = Σ ×1 V (1) ×2 V (2) in terms of n-mode products. CubeSVD is a generalization of SVD and was described in [4]. Tensor A can be written as the n-mode product [4] A = S ×1 V1 ×2 V2 ×3 · · · ×N VN as illustrated in Figure 3 for N = 3. S is called core tensor. S is in general a full tensor, instead of being pseudodiagonal (this would mean that nonzero elements could only occur when the indices i1 = i2 = · · · = iN ). S has the property of all-orthogonality [4]. V\ contains the orthonormal vectors. They called n-mode singular vectors. The Frobenius-norms kSin=i k are n- mode singular values of A. Their order is kSin=1 k ≥ kSin=2 k ≥ · · · ≥ kSin=In k ≥ 0 4 Experimental Results - 2D Bars For testing of above mentioned methods, we used generic collection of 1600 32 × 32 black-and-white images containing different combinations of horizontal and vertical lines (bars). The probabilities of bars to occur in images were the same and equal to Tensor Decomposition for 3D Bars Problem 53 Fig. 2. Unfolding of third order of a tensor A Fig. 3. Order-3 Singular Value Decomposition A 10/64, i.e. images contain 10 bars in average. An example of several images from generated collection is shown in Figure 4. Many of tested methods were able to generate a set of base images or factors, which should ideally record all possible bar positions. However, not all methods were truly successful in this. With SVD, we obtain classic singular vectors, the most general being among the first. The first few are shown in Figure 5. We can se, that the bars are not separated and different shades of gray appear. 54 Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Fig. 4. Some generated images from bars collection Fig. 5. First 64 base images – SVD method The NMF methods yield different results. The original NMF method, based on the adjustment of random matrices W and H provides hardly-recognizable images even for Tensor Decomposition for 3D Bars Problem 55 Fig. 6. First 64 factors – original NMF method Fig. 7. First 64 factors for GD-CLS NMF method (0.01) k = 100 and 1000 iterations (we used 100 iterations for other experiments). Moreover, these base images still contain significant salt and pepper noise and have a bad contrast. 56 Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Fig. 8. First 64 base vectors – SDD method The factors are shown in Figure 6. We must also note, that the NMF decomposition will yield slightly different results each time it is run, because the matrix(es) are pre-filled with random values. The SDD method differs slightly from previous methods, since each factor contains only values {−1, 0, 1}. Gray in the factors shown in Figure 8 represents 0; −1 and 1 are represented with black and white respectively. The base vectors in Figure 8 can be divided into three categories: 1. Base vectors containing only one bar. 2. Base vectors containing one horizontal and one vertical bar. 3. Other base vectors, containing several bars and in some cases even noise. 5 Experimental result - 3D Bars For testing of CubeSVD method, we use several collections of 8 × 8 × 8 3-dimensional cubes. We create 2 types of test collections. The first type contains 2 collections with 15 cubes each which were used for local feature extraction (each cube was decomposed separately). The first collection contains cubes which are crossed by 2 perpendicular planes (Figure 9a). The second collection contains cubes crossed by 3 planes - 2 perpen- dicular and 1 skewed (Figure 9b). The resulting number of singular values was between 1 and 8 (full CubeSVD). The second type contains 4 collections with 1000 cubes each which were used for collection-based feature extraction. The first collection contains cubes with one skewed Tensor Decomposition for 3D Bars Problem 57 Fig. 9. Collections for local features extraction. (a) is example from first collection, (b) is example from second collection plane, second collection contains cubes with 2 skewed planes and so on. Example cubes for each collection is depicted in Figure 10. As a measure for comparing similarity between original and reduced cubes we used the Frobenius norm (without calculating the square root) XXX F 2 (O, R) = (O[i, j, k] − R[i, j, k])2 i j k 5.1 Local feature extraction In the first experiment we applied CubeSVD and SVD algorithm on the first two collec- tions for extracting local features. Results for the first collection with 2 perpendicular planes are depicted in Figure 11 for 6 singular values and are depicted on Figure 12 for 2 singular values. Values of Frobenius norm are shown in tables 1 and 2. It can be seen CubeSVD extracts all of the original bars in Figure 11, while the SVD with the same rank ignored one of the bars, while reconstructing the other two more sharply. This is even more visible in Figure 12, where all bars in CubeSVD are slightly recorded, but classic SVD reconstructs one of the bars perfectly, adding noise to other areas. We see, that this behavior leads to lower Frobenius norm in Tables 1 and 2 for th classic SVD, which satisfies the condition that the value for SVD should be minimal for given rank k. 6 Conclusion Since the CubeSVD method provided only one singular value for planes parallel to the axes, which was to be expected, the experiments were done on planes both perpendicu- lar and skewed. 58 Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Fig. 10. Collections for collection-based features extraction It seems, that the original SVD performed better than CubeSVD, based on the Frobenius norm, but the visual inspection of reduced tensors shows the reason – whilst the horizontal bars were reconstructed nearly perfectly, the vertical ones deteriorated more quickly. On the other hand, the CubeSVD tried to minimize the overall error. We are currently studying the collection-based feature extraction of 3D bars, where the number of singular values ranges from 1 to 512 for 8 × 8 × 8 cubes, compared to 8 singular values for high-order SVD. The classic SVD results for 1 to 8 singular values mentioned in previous section are not satisfactory. We are currently extending our CubeSVD implementation to support tensors of 4 and mode dimensions and preparing to test the High-order SDD and NMF methods against their 2D counterparts. Tensor Decomposition for 3D Bars Problem 59 Fig. 11. Results for first collection and 6 singular values Fig. 12. Results for first collection and 2 singular values References 1. M. Berry, S. Dumais, and T. Letsche. Computational Methods for Intelligent Information Access. In Proceedings of the 1995 ACM/IEEE Supercomputing Conference, San Diego, California, USA, 1995. 2. P. Földiák. Forming sparse representations by local anti-Hebbian learning. Biological cyber- netics, 64:22, pages 165–170, 1990. 3. T. G. Kolda and D. P. O’Leary. Computation and uses of the semidiscrete matrix decomposi- tion. In ACM Transactions on Information Processing, 2000. 4. L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl., 21(4):1253–1278, 2000. 5. F. Shahnaz, M. Berry, P. Pauca, and R. Plemmons. Document clustering using nonnegative matrix factorization. Journal on Information Processing and Management, 42:373–386, 2006. 60 Jan Platoš, Jana Kočı́bová, Pavel Krömer, Pavel Moravec, Václav Snášel Table 1. Cumulative Frobenius norm for first collection Method CubeSVD SVD K=1 1484.31 735 K=2 1356.21 630 K=3 1304.02 525 K=4 1169.81 420 K=5 976.62 315 K=6 708.54 210 K=7 379.29 105 K=8 0 0 Table 2. Cumulative Frobenius norm for second collection Method CubeSVD SVD K=1 1768.84 1045.91 K=2 1587.14 854.01 K=3 1489.88 677.87 K=4 1340.44 514.32 K=5 1091.12 364.01 K=6 813.81 228.41 K=7 438.31 102.49 K=8 0 0 6. M. W. Spratling. Learning Image Components for Object Recognition. Journal of Machine Learning Research, 7:793–815, 2006. 7. M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86, 1991.