Dimensionality reduction for financial data
                           visualization
             Jelena Zubova                                Olga Kurasova                                   Marius Liutvinavičius
Institute of mathematics and informatics,    Institute of mathematics and informatics,                       Kaunas faculty,
             Vilnius University,                          Vilnius University,                              Vilnius University,
             Vilnius, Lithuania                           Vilnius, Lithuania                               Kaunas, Lithuania
      e-mail: jelena.zubova@mii.vu.lt              e-mail: olga.kurasova@mii.vu.lt               e-mail: marius.liutvinavicius@khf.vu.lt


    Abstract—Various data mining methods are used for
examining large financial data sets to uncover hidden and                         II. DIMENSIONALITY REDUCTION METHODS
useful information. Ability to access big data sources raises               The detailed reviews of dimensionality reduction methods
new challenges related with capabilities to handle such enormous        were done by I. K. Fodor (2002), M. Mizuta (2004), C.O.S.
amounts of data. This research focuses on big financial data            Sorzano, J. Vargas et. al. (2014). In this section we present the
visualization that is based on dimensionality reduction
                                                                        brief summary of most popular methods in order to choose best
methods. We use data set that contains financial ratios of
                                                                        suitable in our case.
stocks traded on NASDAQ stock exchange. A brief overview
of the most popular dimensionality reduction and visualization              Dimensionality reduction refers to the process of taking a
methods is presented in this paper. We also show how to adjust          data set with a usually large number of dimensions, and then
the algorithms of these methods for parallel computing. The             creating a new data set with a fewer number of dimensions,
MPI technology is applied in computer cluster to perform                which are meant to capture only those dimensions that are in
dimensionality reduction. The results show that Random                  some sense “important”. The idea here is that we want to
projection and Multidimensional scaling methods can effectively         preserve as much “structure” of the data as possible, while
classify data and find the most promising stocks.
                                                                        reducing the number of dimensions it possesses [6].
    Keywords—financial   data;   dimensionality   reduction;                The demand for such methods rises because various
visualization                                                           sources generate enormous amount of data, e. g laboratory
                                                                        instruments can report thousands measurements for a single
                      I. INTRODUCTION                                   experiment, and the statistical methods face challenging tasks
                                                                        when dealing with such high‐dimensional data. However,
    This template, We use big data analytics for
                                                                        according to C.O.S. Sorzano, J. Vargas et. al. (2014), much of
examining large financial data sets to uncover hidden
and useful information. Ability to access new data sources              the data is highly redundant and can be efficiently brought
provides many opportunities, for example create new                     down to a much smaller number of variables without a
investors sentiment indexes from online social media                    significant loss of information.
streams. But it also raises new challenges related with
capabilities to handle such enormous amounts of data.
We need effective methods and powerful environments to
complete large and complex tasks.
    This research focuses on big financial data
visualization that is based on dimensionality reduction
methods. We use data set that contains financial ratios of
stocks traded on NASDAQ stock exchange. Our goal is to
find the most effective ways to analyze and visualize such              Fig. 1. Dimensionality reduction methods
type data. In the second section, we present a brief overview
of the most popular dimensionality reduction and
visualization methods. In the third section we present how              A. Principal Components Analysis (PCA)
to adjust the algorithms of these methods for parallel                      As long as data have a near-linear structure, the
computing. We use MPI technology in computer cluster                    singularities of the data can be pointed out using Principal
to perform dimensionality reduction. In the last section we             Component Analysis (PCA) [11]. PCA is by far one of the
present the classification and visualization results which              most popular algorithms for dimensionality reduction [5].
where get while using Random projection and
Multidimensional scaling methods. The current data set                     PCA find components that make projections uncorrelated
is relatively small, but the results give suggestions                   by selecting the highest eigenvalues of the covariance matrix
what techniques to choose for future works with big                     and maximizes retained variance [2]. The theoretical idea
online data streams.                                                    behind PCA is that we find the principal components of the


Copyright © 2017 held by the authors


                                                                   94
data, which correspond to the components along which there is             F. Vector quantization [3]
the most variation [6].                                                       Probably the simplest way of reducing dimensionality is by
                                                                          assigning a class (among a total of K classes) to each one of the
B. Random Projection                                                      observations xn. This can be seen as an extreme case of
   Random Projection finds components that make projections               dimensionality reduction in which we go from M dimensions to
uncorrelated by multiplying by a random matrix and minimizes              1 (the discrete class label Ƙ). Each class, Ƙ, has a
computations for a particular dimension size [2].                         representative xƘ which is the average of all the observations
                                                                          assigned to that class. If a vector xn has been assigned to the
    This method involves taking a high-dimensional data set               Ƙn‐th class, then its approximation after the dimensionality
and then mapping it into a lower-dimensional space, while                 reduction is simply xn = xƘn. Widely used K‐means method also
providing some guarantees on the approximate preservation of              belongs to this group of dimensionality reduction methods. [5]
distance. If the input data is an n × d matrix, then to do a
projection, we choose a “suitable” d × k matrix R, and then
                                                                          G. Principal curves, surfaces and manifolds
define the projection of A to be E = AR, which stores k-
dimensional approximations for our n points (matrix E is n × k)               PCA is the perfect tool to reduce data that in their original
[6]. R is a matrix with elements rij, where rij = random                  M‐dimensional space lie in some linear manifold. However,
Gaussian. R can also be constructed in one of the following               there are situations at which the data follow some curved
ways [2]:                                                                 structure. In this case, approximating the curve by a straight
                                                                          line will not perform a good approximation of the original data.
   rij = ±1 with probability of 0.5 each                                  For such type data the solution is to use principal curves,
    rij = ±1 with probability of 1/6 each, or 0 with a probability        surfaces and manifolds. [5]
of 2/3                                                                       Curve fitting to data is an important method for data
   Steve Vincent (2004) made comparison of principal                      analysis. When we obtain a fitting curve for data, the
component analysis and random projection in text mining. He               dimension of the data is nonlinearly reduced to one dimension.
found that in general Random projection is faster by many                 [11]
orders of magnitude over PCA, but in most cases produced
lower accuracy. This lead to suggestion to use Random                     H. Projection pursuit
projection method if speed of processing is most important. [2]               PCA is ineffective in analyzing nonlinear structures, i.e.
                                                                          curves, surfaces or clusters. In such cases Projection pursuit
C. Factor analysis                                                        method might be a solution. The goal of projection pursuit is to
    Like PCA, factor analysis is also a linear method, based on           find a projection that reveals interesting structures in the data.
the second-order data summaries. Factor analysis assumes that             “Interesting” is defined as being “far from the normal
the measured variables depend on some unknown, and often                  distribution”, i.e. the normal distribution is assumed to be the
immeasurable, common factors. The goal of this method is to               most uninteresting. The degree of “far from the normal
uncover such relations, and thus can be used to reduce the                distribution” is defined as being a projection index. There are
dimension of data sets following the factor model. [3]                    various standards for interestingness. [11]
    Factors are assumed to follow a multivariate normal                   I. Multidimensional scaling
distribution, and to be uncorrelated to noise [5].
                                                                              Given n items in a d-dimensional space and n x n matrix of
                                                                          proximity measures among the items, multidimensional scaling
D. Independent component analysis (ICA)
                                                                          (MDS) produces a k-dimensional, k ≤ d, representation of the
    ICA is a higher-order method that seeks linear projections,           items such that the distances among the points in the new space
not necessarily orthogonal to each other, that are as nearly              reflect the proximities in the data. [3]
statistically independent as possible. Statistical independence is
a much stronger condition than uncorrelatedness. ICA can be                   The proximity measures the (dis)similarities among the
considered as a generalization of the PCA and the Projection              items, and in general, it is a distance measure: the more similar
pursuit concepts. While PCA seeks uncorrelated variables, ICA             two items are, the smaller their distance is. Popular distance
seeks independent variables. In contrast with PCA, the goal of            measures are Euclidian distance, the Manhattan distance and
ICA is not necessarily dimension reduction. [3]                           the maximum norm. [3]

E. Maximum likelihood                                                     J. Kohonen’s self-organizing maps
   This method specifies the likelihood of the noise-free ICA                 Self‐Organizing Maps (SOM) are generalizations of the
model, and uses the maximum likelihood principle to estimate              vector quantization approaches presented above. SOM work by
the parameters. The advantages of this method include the                 assigning to each input vector a label Ƙn corresponding to the
asymptotic efficiency of maximum likelihood estimates under               class closest to its representative vector. Kohonen’s SOMs start
regularity conditions. However, it is computationally intensive,          by creating a set of labels on a given manifold (usually a
which make it undesirable in many practical situations. [3]               plane). Labels are distributed in a regular grid and the
                                                                          topological neighborhood is defined as the neighbors in the
                                                                          plane of each point of the grid. [5]


                                                                     95
K. Support Vector machines
                                                                                               CPU    CPU                          CPU       CPU
    Support vector machines (SVMs) have been recognized as
                                                                                Memory                                Memory
one of the most successful classification methods [8]. SVMs
are a powerful machine learning technique not only for                                         CPU    CPU                          CPU       CPU
classification, but also for regression. It works by finding a
                                                                                                          Network
hyperplane that best splits the target values. SVM performs
classification by finding the hyperplane that maximizes the
margin between the two classes. The vectors (cases) that define
                                                                                               CPU    CPU                          CPU       CPU
the hyperplane are the support vectors. The margin is the
largest distance between the borderline data points. SVMs                       Memory                                Memory
handle nonlinearity by mapping the data into a higher-                                         CPU    CPU                          CPU       CPU
dimensional space using kernel tricks. [9]
                                                                          Fig. 2. MPI scheme
L. Other methods
                                                                              According to R. Rabenseifner, G. Hager, etc. (2009), most
    Important comparison of different methods was made by R.
                                                                          systems in high-performance computing feature a hierarchical
S. Rosaria, I. Adae et. al. (2014). They concentrated on a few
                                                                          hardware design: shared memory nodes with several multi-core
state-of-the-art methods to reduce input dimensionality and
                                                                          CPUs are connected via a network infrastructure.
examined how they might affect the final classification
accuracy. In particular, they tried removing data columns with                The comparison between MPI and hybrid models was made
too many missing values and used low variance filter that                 by F. Cappello, D. Etiemble (2000). Their test results showed
calculates each column variance and removes those columns                 that a unified MPI approach is better for most of the
with a variance value below a given threshold. They also                  benchmarks.
reduced highly correlated columns and implemented
dimensionality reduction via tree ensembles (Random forests).                 Rifat Chowdhury (2010) designed an algorithm using the
But the best results were got by using backward feature                   libraries of OpenMP to compute Matrix Multiplication
elimination and forward feature construction together with                efficiently. The speedup was close to the ideal 4 times using
removing data columns with too many missing.                              four cores of a processor to compute the final matrix instead of
                                                                          one. This is an example how parallel computing can increase
    The Backward Feature Elimination loop performs                        the performance.
dimensionality reduction against a particular machine learning
algorithm. Similarly to the Backward Feature Elimination                  B. Random Projection method
approach, a Forward Feature Construction loop builds a                        Our goal is to combine speed and accuracy. For initial data
number of pre-selected classifiers using an incremental number            dimensionality reduction we selected Random projection
of input features. The Forward Feature Construction loop starts           method because it is very fast. Moreover we adapted this
from 1 feature and adds one more feature at a time in the                 method for parallel computing. This method can be applied for
subsequent iterations. [12]                                               many purposes. For example, G. E. Dahl, J. W. Stokes et. al.
                                                                          successfully used Random projections for large-scale malware
   III. ALGORITHM ADOPTION FOR PARALLEL COMPUTING                         classification to further reduce the dimensionality of the
    In this section we briefly describe the MPI technology that           original input space.
is used for parallel computing in computer cluster. We also
present the algorithm of selected dimensionality reduction
method which is adapted for MPI.

A. MPI
   MPI is message-passing library interface specification. MPI
addresses primarily the message-passing parallel programming
model, in which data is moved from the address space of one
process to that of another process through cooperative
operations on each process. [7]
    Figure 2 shows typical structure of multi-core cluster. In the
pure MPI case there is one MPI proccess per core. H. Jin, D.
Jespersen (2011) stated that the increasing number of cores in
modern microprocessors is pushing the current high
performance computing systems into the petascale and
exascale era.


                                                                          Fig. 3. Random projection algorithm for parallelization with MPI


                                                                     96
    E. Bingham and H. Mannila used Random projection as
dimensionality reduction method tool in cases of processing
both noisy and noiseless images, and information retrieval in
text documents. In our case we use this method for financial
data mining.
    To use Random projection method, we firstly declare 3
matrixes: n × d matrix1 (contains initial financial data), d × k
matrix2 (operating matrix) and n × k matrix3, which is our
results matrix. During the next step we initialize random
integer values to all elements of matrix1. Then we assign
values of 1 or -1 to matrix2 with probability 0,5 each. Then
matrix1 and matrix2 are multiplied and results are assigned to
the elements of matrix3.
    The code is constructed accordingly to MPI requirements
(Fig. 3). Program includes MPI library. All matrices and                Fig. 4. The execution times of Random projection algorithm
variables are declared in serial code region, but MPI
environment is initialized by special functions before parallel             Technical indicators are ATR, Beta, Simple moving
region begins. The program uses determined amount of threads            averages     of    different     periods    of   time,     RSI,
(“ranks”). “Zero“ thread is used to divide multiplication task          the lowest and highest prices at which a stock has traded in the
into multiple processes and handle incoming results from the            previous 52 weeks. And the last group contains parameters of
remaining threads. All computing operations are                         stock ownership.
simultaneously done by multiple threads.                                    Usually our goal is to visualize data and uncover hidden
                                                                        information. This means that for example in this case we have
                     IV. RESEARCH RESULTS                               to reduce the dimensions from 54 to 2 or 3. It’s complicated to
    In this section we present the results of applying Random           quickly perform such tasks with single machine. So in the first
projection and Multidimensional scaling methods for our                 step we executed Random projection algorithm in the cluster
financial market data set.                                              having 120 nodes to reduce the dimensions from initial 54 to
                                                                        smaller amount: 25, 15, 10 and 5 (got several new data sets).
    The data set contains information about 1400 companies              When the most suitable number of threads were chosen it took
that are traded on NASDAQ stock exchange. Every company                 less than 0,04 seconds to complete each task (Figure 4). It
is described by 54 different parameters. All these parameters           should be noted that using more threads not always leads to
are grouped into 6 categories: overview, valuation, financial,          better performance. As this data set was relative small, 10 to 30
performance, technical and ownership. All data is from                  threads was optimal choice to execute the algorithm in the
finviz.com website.                                                     fastest way. In comparison with serial code, MPI code worked
                                                                        from 10 to 20 times faster.
   In the Overview group there are such parameters as sector,
industry, market capitalization, price, volume. In Valuation               In the second step we applied Multidimensional scaling
group we have P/E, PEG, P/B. EPS and sales parameters.                  method (with Sammon) to visualize the reduced data set
Financial parameters group contains ROA, ROE, ROI,                      containing 5 variables. We investigated two stock classification
earnings, debt and margins parameters. Performance indicators           cases: based on analysts’ recommendation and based on
present price changes during different periods of time,                 multicriteria indicator.
volatility and recommendation values.


Fig. 5. Parameters of analysed companies


                                                                   97
                                                                             TABLE I.         MULTICRITERIA INDICATOR FOR STOCK CLASSIFICATION
                                                                                                           EPS this
                                                                                                 P/E       Year %        ROE   SMA50 %   RSI
                                                                             „Good stocks“       < 20         >5         >5      >0      > 50
                                                                              „Bad stocks“       > 20         <5         <5      <0      < 50

                                                                           B. Stock classification based on multicriteria indicator
                                                                               In the second case we constructed multicriteria indicator for
                                                                           separating stocks that are worth investing (“good”), neutral
                                                                           stocks and “bad” stocks.
                                                                               In our assumption “good” stocks meet all following criteria:
                                                                           P/E < 20, EPS this year > 5, ROE > 5, 50-period SMA > 0, RSI
                                                                           > 50. Table 1 shows the ratios by which “good” and “bad”
                                                                           classes of stocks are constructed. “Neutral” stocks are those
                                                                           which don’t fall within any of previously defined two classes.
                                                                               We used the same Multidimensional scaling method to
Fig. 6. Data visualization using Multidimensional scaling                  visualize the reduced data set, so the arrangement of data points
                                                                           is the same as presented in figure 6. But in this case we found
                                                                           that the three different classes of stocks were separated quite
A. Stock classification based on analysts’ recommendation                  well.
    One of our initial 54 factors was the recommendation of
financial analysts. This variable ranges from 1 (strong                        Figure 8 shows that “good” companies were presented in
recommendation to buy stock) to 5 (strong recommendation to                the left side of the plot. The “bad” and “neutral” companies
sell stock). We raised hypothesis that “good” companies                    overlapped, but “bad” companies separated from “good”
(having recommendation ratio from 1 to 2) should be separated              companies.
from the rest companies after dimensionality reduction and                     This leads to suggestion, that dimensionality reduction and
visualization processes.                                                   visualization methods can effectively classify data and find the
   However, figure 6 shows that the combination of Random                  most promising stocks. However, in order to explain the
projection and Multidimensional scaling methods couldn’t                   differences between classes we need to use several different
separate these two classes. They just overlapped each other.               ratios.
We tried to use 3D plot, but it was uninformative too (Fig. 7).                We used the combination of Multidimensional scaling and
     The first thought could be that the methods are not                   Random projection methods because the latter one is very fast.
efficient enough. But it also could be that in reality the                 However it might be not so accurate. So we tried to apply MDS
potentiality of “good” and “bad” companies (as determined by               method for full initial data set and it actually led to better
analysts) don’t differ so much. This would explain why using               results. 3D plot in figure 9 shows that “bad” stocks were
just the opinion of financial advisors often leads to unstable             separated from “neutral” stocks.
returns or even losses. It also suggests that opinions might be
more based on intuition than various stock ratios.


                                                                           Fig. 8. Visualizing three classes of stocks
Fig. 7. Data visualization using Multidimensional scaling (3D plot)


                                                                      98
                                                                          to use several different ratios. It should be also noted, that
                                                                          MDS method alone was more accurate than combination of
                                                                          two methods.

                                                                                                        REFERENCES

                                                                          [1]   R. Chowdhury, "Parallel Computing with OpenMP to solve matrix
                                                                                Multiplication," UCONN BIOGRID REU Summer 2010. Department
                                                                                of Computer Science & Engineering. University of Connecticut, Storrs,
                                                                                CT 06269.
                                                                          [2] Dr. Domeniconi, "Comparison of Principal Component Analysis and
                                                                                Random Projection in Text Mining," April 29, 2004. INFS 795.
                                                                          [3] I. K. Fodor, "A survey of dimension reduction techniques," Center
                                                                                for Applied Scientific Computing, Lawrence Livermore
                                                                                National Laboratory, June 2002
                                                                          [4] Intel Parallel studio. Access: https://software.intel.com/en-us/
                                                                                intel-parallel-studio-xe
                                                                          [5] C. O.S. Sorzano, J. Vargas, A. Pascual Montano, "A survey of
Fig. 9. Visualizing initial data set using MDS                                  dimensionality reduction techniques," 2014. Access: arXiv:1403.2877.
                                                                          [6] A. K. Menon. Random projections and applications to dimensionality
                                                                                reduction. School of Information Technologies, The University of
                          V. CONCLUSIONS                                        Sydney, 2007.
    Various data mining methods are used for examining large              [7] MPI        technology.     Access:     http://www.mpi-forum.org/docs/mpi-
                                                                                3.1/mpi31-report.pdf
financial data sets to uncover hidden and useful information.
                                                                          [8] H. Kim , P. Howland, H. Park. Dimension Reduction in Text
This research focused on financial data visualization that is                   Classification with Support Vector Machines. Journal of Machine
based on dimensionality reduction methods. We used data set                     Learning Research 6 (2005) 37–53
that contained financial ratios of stocks traded on NASDAQ                [9] S. Kudyba. Big Data, Mining, and Analytics–Components of Strategic
stock exchange. A brief overview of the most popular data                       Decision Making. March 12, 2014 by Auerbach Publications. Reference
dimensionality reduction and visualization methods was                          - 325 Pages - 89 B/W Illustrations. ISBN 9781466568709 - CAT#
presented in this paper. We also showed how to adjust the                       K16400
algorithm of Random projection method for parallel                        [10] Message                 passing               interface.         Access:
                                                                                https://computing.llnl.gov/tutorials/mpi/
computing. The MPI technology was applied in computer
cluster to perform dimensionality reduction. The performance              [11] M. Mizuta, "Dimension Reduction Methods, Papers," Humboldt-
                                                                                Universität Berlin, Center for Applied Statistics and Economics
results revealed the advantages of parallel computing. Our goal                 (CASE), 15, 2004.
was to visualize data and uncover hidden information. In order            [12] R. S. Rosaria, I. Adae, A. Hart, M. Berthold, "Seven Techniques
to do this we had to reduce the dimensions to 2 or 3. In the first              for Dimensionality Reduction," Knime, 2014.
step we executed Random projection algorithm in the cluster to             [13] H. Jin, D. Jespersen etc. High performance computing using MPI
reduce the dimensions from initial 54 to smaller amount. In the                 and OpenMP on multi-core parallel systems. Parallel Computing 37
second step we applied Multidimensional scaling method to                       (2011) 562–575
visualize the reduced data set.                                             [14] R. Rabenseifner, G. Hager, G. Jost, "Hybrid MPI/OpenMP
                                                                                Parallel Programming on Clusters of Multi-Core SMP Nodes,"
    One of the data set ratios was the recommendation of                        Conference: Proceedings of the 17th Euromicro International
financial analysts. We raised hypothesis that companies having                  Conference on Parallel, Distributed and Network-Based Processing,
recommendation to buy them should be separated from the rest                    PDP 2009, Weimar, Germany, 18-20 Febuary 2009
of companies. But the results showed that the combination of              [15] F. Cappello, D. Etiemble, "MPI versus MPI+OpenMP on the IBM SP
                                                                                for the NAS Benchmarks," Supercomputing, ACM/IEEE 2000
Random projection and Multidimensional scaling methods                          Conference. ISSN: 1063-9535.
couldn’t do this. This might have happened because in reality              [16] G. E. Dahl, J. W. Stokes, L. Deng, D. Yu, "Large-scale
the potentiality of “good” and “bad” companies (as they are                     malware classification using random projections and neural
determined by analysts’ recommendations) doesn’t differ so                      network," Acoustics, Speech and Signal Processing (ICASSP),
much. However, in the second case of stock classification                       2013 IEEE International Conference, pp. 3422-3426, 2013.
based on multicriteria indicator “good” and “bad” stocks were             [17] E. Bingham, H. Mannila, "Random projection in dimensionality
separated quite well. This leads to suggestion, that                            reduction: Applications to image and text data," Proceedings of
                                                                                the seventh ACM SIGKDD international conference on
dimensionality reduction and visualization methods can                          Knowledge discovery and data mining, pp. 245-250, 2001.
effectively classify data and find the most promising stocks.
But in order to explain the differences between classes we need


                                                                     99