<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Clustering underlying stock trends via non-negative matrix factorization</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Pazienza</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica</orgName>
								<orgName type="institution">Università di Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sabrina</forename><forename type="middle">Francesca</forename><surname>Pellegrino</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Dipartimento di Matematica</orgName>
								<orgName type="institution">Università di Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stefano</forename><surname>Ferilli</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica</orgName>
								<orgName type="institution">Università di Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Floriana</forename><surname>Esposito</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica</orgName>
								<orgName type="institution">Università di Bari</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Clustering underlying stock trends via non-negative matrix factorization</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">61CF704E4B95F21EA21449E07CA14AAA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Building a diversified portfolio is an appealing strategy in the analysis of stock market dynamics. It aims at reducing risk in market capital investments. Grouping stocks by similar latent trend can be cast into a clustering problem. The classical K-Means clustering algorithm does not fit the task of financial data analysis. Hence, we investigate Non-negative Matrix Factorization (NMF) techniques which, contrary to K-Means, turn out to be very effective when applied to stock data. In particular, recently developed NMF techniques, which incorporate convexity constraints, generate more disjoint latent trend groupings than the traditional sector-based groupings. In this paper, the NMF technique and its variants are applied to NASDAQ stock data (i.e., daily closing prices). Experimental results confirm that (convex ) NMF techniques are highly recommended to produce trend based assets and build a good diversified portfolio.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>A trader's purpose is to beat the market and, then, to make money. To achieve this objective, the trader should be able to predict future stock prices. In this way, he could determine a self-financing trading strategy that maximizes his portfolio return <ref type="bibr" target="#b4">[5]</ref>. However, because of randomness in the market, creating and managing successful portfolios of financial assets is a difficult practice. Diversification theory is the most widely used practice by individuals to develop portfolios. It is based on the principle of attempting to maximize expected return for a given amount of risk or, equivalently, to minimize the risk for a given amount of return <ref type="bibr" target="#b8">[9]</ref>. It is one of the most effective ways to get low risk-reward ratios. This problem can be seen as a clustering process, in which the aim is grouping data (e.g., stocks) into subgroups of similar behavior (e.g., market trend).</p><p>Clustering is arguably one of the most common steps in unsupervised behavior analysis. With K-Means <ref type="bibr" target="#b7">[8]</ref>, a classical clustering algorithm, it is not possible to establish the effectiveness and coherence of the clusters when dealing with stock data. Therefore, more powerful analysis techniques are required. Matrix decomposition strategies can overcome this problem, in fact, they can provide ways to produce cleaner data which may lead to a better interpretation of the results. Since the closing prices of stocks are definitely non-negative signals, it makes sense to apply Non-negative Matrix Factorization (NMF) on them.</p><p>In this paper, we use NMF and its variants to learn the components which drive the stock market, and to construct a diversification method using cluster analysis of financial assets. We compare NMF with its two variants, Convex NMF (C-NMF) and Convex-Hull NMF (CH-NMF), obtained by imposing orthogonal and convex constraints. We investigate the impact of these constraints in real stock return data.</p><p>Finally, we conclude that CH-NMF yields a more accurate and disjoint representation of the data, and allows a better interpretation of clustering. Moreover, CH-NMF is very useful because it converges in only one iteration, with very low runtime and, most important, achieving a very small error in Frobenius norm.</p><p>This paper is organized as follows. The next section recalls useful background information, including related works. Then, Section 3 introduces the mathematical aspects of NMF, its variants C-NMF and CH-NMF. Section 4 describes experimental results and evaluates the difference between the methods in terms of number of iterations, error and clustering representation. Finally, Section 5 concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background and Related Work</head><p>Portfolio Diversification is the process of choosing investments in order to reduce exposure to a particular asset. This is typically done by investing in a variety of assets, because, if the stock prices do not move together, then a diversified portfolio of assets will have lower variance than the weighted average variance of the assets. A straightforward diversification breaks the stock market into several classic sectors according to the primary activities of a company (such as Basic Materials, Technology, Financial, Health Care, etc.).</p><p>As the stock evolution is comparable to a stochastic process, stock prices are determined by fluctuations in underlying or latent trends, which can be modeled by a Brownian motion <ref type="bibr" target="#b4">[5]</ref>. Therefore, stocks in the same sector may not show similar behavior in the market. So, albeit the sector-based strategy is simple enough to apply, the portfolio will not ensure the maximum return. Hence, if one were able to identify and predict the underlying trends from the stock market data, one would have the opportunity to leverage this knowledge to obtain genuine portfolio diversification opportunities. In other words, investors should diversify their money into not only different sectors, but also different trends (e.g. clusters).</p><p>Unfortunately, the K-Means clustering algorithm, has still some limitations in the exploitation of financial data. Indeed, in <ref type="bibr" target="#b0">[1]</ref> is stated that K-Means clustering tends to find spherical clusters so that centroid-based clustering does not handle the noise. Hence, the authors aimed to discover other centroid-based clustering approaches for financial datasets and to introduce weighted Euclidean distance instead of standard Euclidean distance to re-evaluate centroid-based clusters, as to overcome the limitations of K-Means.</p><p>Matrix decompositions, especially NMF, are used in the literature for the analysis of financial data. In particular, <ref type="bibr" target="#b3">[4]</ref> applied a constrained NMF to real stock data and found that there is a tradeoff between smoothness of trend and sparsity of the weight matrix. <ref type="bibr" target="#b1">[2]</ref> provides a contribution to the Diversification Theory by comparing Semi-NMF with Sparse-semiNMF applied to a diffusive model based on the Black&amp;Scholes equation for option pricing. It is deduced that Sparse-semiNMF outperforms semi-NMF because it better reduces the risk related to the portfolio selection. Using the multiplicative update rules algorithm, <ref type="bibr" target="#b6">[7]</ref> analyzes the behavior of latent trends for different value of the number of the latent forces and shows that the increase in the underlying force does not affect the trends of the original forces. In <ref type="bibr" target="#b10">[11]</ref> a variant of Sparse-semiNMF with sum-to-one and smoothness constraints is applied to the portfolio diversification problem, and results show a disjoint data representation that allows a better understanding of stock properties.</p><p>In our setting, assume the market is made up of m stocks S 1 , S 2 , . . . , S m ; each stock S i is stored as a row vector whose entries are n daily closing prices. Suppose there are k latent bases, W 1 , W 2 , . . . , W k ; each W j is a n-dimensional row vector, which can be thought as a Brownian motion. So it is possible to express each stock as linear combination of these bases:</p><p>(1)</p><formula xml:id="formula_0">S i = k j=1 h ij W j ,</formula><p>where h ij is a non negative real number and indicates the association degree of the i-th stock with the basis W j . With matrix notation, (1) can be expressed as</p><p>(2)</p><formula xml:id="formula_1">S + = H + W ± ,</formula><p>where S ∈ R m×n + , H ∈ R m×k + (weight matrix) and W ∈ R k×n ± (trend matrix). This strongly recalls the non-negative matrix factorization formulation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Non-Negative Matrix Factorization</head><p>The standard definition for non-negative matrix factorization of a matrix S is</p><formula xml:id="formula_2">(3) S = H W,</formula><p>where S ∈ R m×n , H ∈ R m×k and W ∈ R k×n , and k ≤ m. Both W and H must contain only non-negative entries. W is the matrix of factors and H is the mixing matrix.</p><p>According to <ref type="bibr" target="#b5">[6]</ref>, each data point, which is represented as a row in S, can be approximated by an additive combination of the non-negative basis vectors, which are represented as rows in W , weighted by the components of H. Matrices H and W are found by solving the optimization problem <ref type="bibr" target="#b3">(4)</ref> min</p><formula xml:id="formula_3">H≥0, W ≥0 S − H W 2 F ,</formula><p>where • F is the Frobenius norm. The algorithm is expressed in terms of a pair of update rules that are applied alternately:</p><formula xml:id="formula_4">H ij = H ij k S ik (H W ) ik W jk , W ij = W ij k H ki S kj (H W ) kj .</formula><p>Matrices H and W are initialized at random. Various variants and improvements to NMF have been introduced in recent years <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>Convex NMF Convex non-negative matrix factorization (C-NMF) <ref type="bibr" target="#b2">[3]</ref> allows the data matrix S to have mixed signs. It minimizes</p><formula xml:id="formula_5">S − S H W 2 F subject to the convex constraint H i 1 = 1, H ≥ 0, where S ∈ R m×n , H ∈ R n×k and W ∈ R k×n .</formula><p>Matrices H and W are updated iteratively, until convergence, using the following update rules:</p><formula xml:id="formula_6">H ik = H ik (Y + W ) ik + (Y − H W T W ) ik (Y − W ) ik + (Y + H W T W ) ik W ik = W ik (Y + H) ik + (W H T Y − H) ik (Y − H) ik + (W H T Y + H) ik</formula><p>where Y = S T S, and matrices Y + and Y − are given by</p><formula xml:id="formula_7">Y + ik = 1 2 |Y ik | + Y ik and Y − ik = 1 2 |Y ik | − Y ik respectively.</formula><p>Matrices H and W are initialized at random. The convex constraint imposed on H has the advantage that one might interpret the rows of H as weighted sums of certain data points. This means that these rows can be interpreted as centroids. Moreover, C-NMF solutions are generally significantly more orthogonal than Semi-NMF solutions.</p><p>Convex-Hull NMF Massive datasets are likely to capture even very rare aspects of the problem at hand. Along this line, <ref type="bibr" target="#b9">[10]</ref> recently introduced a datadriven Convex NMF approach, called Convex-Hull NMF (CH-NMF), that is fast and scales extremely well: it can efficiently factorize huge matrices and in turn extract meaningful "clusters" from massive datasets. The key idea is to restrict the clusters to be combinations of vertices of the convex hull of the dataset; this allows to directly explore the data itself to solve the convex NMF problem.</p><p>We consider a factorization of the form S = S H W where S ∈ R m×n , H ∈ R n×k and W ∈ R k×n . We further restrict the rows of H and W to convexity, i.e.,</p><formula xml:id="formula_8">H i 1 = 1 H ≥ 0, W j 1 = 1 W ≥ 0.</formula><p>In contrast to C-NMF, we consider a convex combination on both H and W . The task now is to minimize</p><formula xml:id="formula_9">(5) S − S H W 2 F , s.t. H i 1 = 1, H ≥ 0, W j 1 = 1, W ≥ 0.</formula><p>This optimization problem is equivalent to projecting the solution in the convex hull of S. Convexity constraints yield latent components with some properties: first, any data point can be expressed as a convex and meaningful combination of these basis vectors; this offers interesting new opportunities for data interpretation. Second, they span a simplex that encloses most of the remaining data. CH-NMF aims at a data factorization based on the data points residing on the data convex hull. Therefore, CH-NMF seeks an approximate solution by subsampling the convex hull, exploiting each data point on the convex hull as a linear lower dimensional projection of the original data.</p><p>A consequence of the convexity of H and W is that the rows of H tend to become nearly orthogonal. Requiring orthogonal rows for H produces a noncorrelation between stocks being attracted from different clusters. This indicates a more accurate clustering and, hence, that the NMF family is competitive with K-Means for the purposes of clustering financial data. Therefore, the aim of this paper is to exploit NMF techniques, and in particular the ones with convexity constraints, in the context of financial data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluations and Comparisons of NMF Techniques</head><p>The objective of this work is studying the application of clustering approaches to determine a trend-based portfolio diversification that is more consistent than the sector-based one. We ran experiments on stock data gathered from the NASDAQ Stock Market (http://www.nasdaq.com), and specifically on the closing prices of 28 stocks in the past 10 years (2518 working days). Table <ref type="table" target="#tab_0">1</ref> reports the list of stocks involved in the experiments and their belonging sector. Fig. <ref type="figure" target="#fig_1">1(a)</ref> shows the actual stock prices, useful as a reference to compare the numerical solutions that we obtain from the different methods. So, we want to identify the subgroups (clusters) of stocks that show a similar trend. In particular, we want to assess the performance of NMF, C-NMF and CH-NMF. We try different numbers k of clusters. Since the considered stocks involve only 8 market sectors overall, we ran the methods 6 times, one for each k ∈ {3, 4, . . . , 8}. We used a Python implementation of these methods available at http://pymf.googlecode.com.</p><p>The raw data were preprocessed as log-returns and stored into a matrix S ∈ R 28×2518 . In this way the variances of stock data distribution are homogeneous to allow a better understanding of the graphical results. We compute two matrices H and W for the given stock data matrix S. H and W are used to identify the k cluster labels of the stocks: in fact, rows of W regard the cluster centroids while H is the cluster membership indicator matrix. In other words, sample i is in cluster j if H ij is the largest value in row H i,: . The product H W , representing the reconstruction of S, is a useful way to explicit the difference between the original stock prices data and the approximated dynamics of transformed data.</p><p>Table <ref type="table" target="#tab_1">2</ref> reports, for each k, statistics about the results obtained by the NMF methods: number of iterations for convergence, error estimation in Frobenius norm, and number of non-empty clusters in the result. We note that, in most cases, all decomposition methods provide a good reconstruction of data. They differ in the cost to achieve a good decomposition (the larger the number of iterations, the more expensive the method). We want to highlight that CH-NMF is very fast and its estimated Frobenius error is of the same order of magnitude as the other methods. In fact, even if we set a higher number of maximum iterations,   CH-NMF needs only 1 iteration to converge for these particular dataset, as reported in Table <ref type="table" target="#tab_1">2</ref>. For others techniques involved in the experiments, NMF requires up to 5000 iterations in the worst case, while C-NMF takes less iterations than NMF to converge, but it always discovers only two non-empty clusters. This is very unattractive because, considering only 2 trends will expose investors to high risks as they would diversify their portfolio making it up with only 2 stocks, one for each trend. The role of k is to force a representation for the data that is more compact than its actual form. Assuming a more compact representation will capture underlying regularities in the data that might be obscured by the form in which the data is found in matrix S. The target is to achieve a low rank approximation which ensures a good interpretation of data in terms of clustering partition, data reconstruction and compactness of representation. Indeed, for a genuine portfolio diversification, choosing k = 4 represents a fair compromise between a lower rank approximation and the goal of yielding a good cluster partition. The reconstruction obtained for k = 4 is shown Fig. <ref type="figure" target="#fig_1">1(b,c,d</ref>).</p><p>Now, we focus on the study of W , i.e., the matrix which represents the latent trends. As shown in Fig. <ref type="figure" target="#fig_3">2</ref>, the trend obtained from NMF points out the increment of fluctuations as k grows up, despite the quality of graphical reconstruction being good. In Fig. <ref type="figure" target="#fig_2">3</ref> we can see that C-NMF shows too many fluctuations and does not allow us to compose a good diversified portfolio. Indeed, as shown in Table <ref type="table" target="#tab_1">2</ref>, for each k, C-NMF provides always only 2 clusters, which reflect the trends that can be seen in the figures. A reason for this behavior is related to the fact that C-NMF imposes the convexity constraint only on H. Hence, convexity for H should be used together with convexity also on W . In fact, CH-NMF is able to overcome this problem and leads to more regular components (cfr. Fig. <ref type="figure" target="#fig_5">4</ref>).    Therefore, we can state that the convexity constraint of H and W provides a good adjustment: the bases become more disjoint and Frobenius norm decreases at a speed that depends on the number of iterations. It is important to note that while all methods try to minimize the same criterion, they impose different constraints and thus yield different matrix factors. For example, CH-NMF assumes W and H to be non-negative matrices and often leads to sparse representations of the data.</p><p>Another important graphical confirmation of our proposal can be found in the analysis of colormaps which is a good way to display matrices in scaled colors. It represents a color-filled table in which every color indicates the weight of each corresponding matrix component according to a total order relation managed by a color scale called colorbar. In this way, we can evaluate the components of greater weight associated with latent trends. More precisely, every H ij indicates how the i-th stock is related to the row basis W j . Also in this representation, we can observe the degree of belonging of each data point to a cluster by selecting, for each row, the highest element in the colormap. In the case of NMF (see  Fig. <ref type="figure" target="#fig_7">5</ref>(a)), colormaps display a regular proceeding of data with a high peak in some rows. This determines the membership of an element in a row to a cluster in the corresponding column. While, for C-NMF (see Fig. <ref type="figure" target="#fig_7">5(b</ref>)) the elements with the highest color with respect to the colorbar are located in the first or last column, giving more emphasis to the fact that the resulting clusters are always 2. Regarding colormaps for CH-NMF (see Fig. <ref type="figure" target="#fig_7">5(c</ref>)), we can observe a clearer disjunction of columns, meaning that the resulting clusters are readily visible. Compared to the K-Means algorithm, used as baseline, the main difference lies into the creation of clusters. In fact, K-Means always produces exactly k clusters, while NMF methods generate at most k clusters, as shown in Table <ref type="table" target="#tab_1">2</ref>. This means that there are centroids which are not attracting stocks.</p><p>After analyzing all decompositions, our main purpose is to obtain clusters of stocks with the same trend, starting from a matrix decomposition of data in a such way that W would hold centroids coordinates and H would hold the relationship degree of different centroids. We collect every i-th row of H, which corresponds to the i-th stock index for Table <ref type="table" target="#tab_0">1</ref>, into their own membership cluster in order to discover, for each k, which subgroups of stocks remain unaffected. Thus, the most frequent subgroups of stock data can be chosen as final outcome of our portfolio diversification strategy. More precisely, it could be possible to construct a tempting portfolio by selecting the most promising stocks from each subgroup. We implemented this functionality in MATLAB with the objective to take out a cluster matrix by varying both of k and the decomposition method. In Tables <ref type="table" target="#tab_4">3-5</ref>, we show the resulting clusters of stock data: we see that clusters of stocks persist across the different decomposition methods. To give some concrete examples, stocks with index 1, 2, 9, 27 (and often 12 too) are always grouped together. They correspond to red indexes in the tables. Summing up, the obtained clusters demonstrate successful application of CH-NMF to the analysis of financial data. This means that CH-NMF is robust in the case of analysis of stock market. Moreover, it provides a trend-based diversification containing groups of different sectors. The most interesting result is that the stocks of the same sector is not necessarily assigned into the same cluster and vice versa, which is of potential use to guide diversified portfolio. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>Constructing a diversified portfolio, in which the correlation between constituent asset classes and investment strategies is meaningfully low, can be challenging, in order to reduce the exposure to risk by investing in a variety of assets. Our aim is to group stocks having similar trend. This can be cast as a clustering problem in data mining that we solve with NMF techniques. We investigate NMF and its variants with convexity constraints to improve the exploitation of similar stock trends. In particular, we show that, for this task, CH-NMF is a very fast and scalable in terms of speed and reconstruction quality. Our extensive experimental evaluation shows that NMFs better point out the clustering properties, additionally yielding very low error in Frobenius norm and high efficiency in terms of convergence time. Furthermore, we compared the resulting clusters to check whether frequent itemsets of stock stay together still while the number of requested clusters changes. The task of prediction is not applicable for NMF techniques because the number of clusters to be discovered is given in input.</p><p>As future work, we will use more datasets from different markets and will investigate further decomposition techniques that can further improve the effectiveness of clustering stock data and impose other penalty constraints in order to achieve a better portfolio diversification strategy, reduce the risk of investments and, hence, beat the market.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>( a )</head><label>a</label><figDesc>Actual stock prices data. (b) NMF. (c) C-NMF. (d) CH-NMF.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Original stock data with reconstruction for k = 4</figDesc><graphic coords="7,164.74,222.81,138.33,71.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>(a) k = 3 .</head><label>3</label><figDesc>(b) k = 4. (c) k = 5. (d) k = 6. (e) k = 7. (f) k = 8.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: Trends for NMF</figDesc><graphic coords="8,255.80,250.42,103.75,53.36" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 3 :</head><label>3</label><figDesc>Fig. 3: Trends for C-NMF</figDesc><graphic coords="8,255.80,527.91,103.74,53.33" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 4 :</head><label>4</label><figDesc>Fig. 4: Trends for CH-NMF</figDesc><graphic coords="9,142.84,205.64,103.74,53.43" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>(a) NMF. (b) C-NMF. (c) CH-NMF.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Fig. 5 :</head><label>5</label><figDesc>Fig. 5: Colormaps for k = 4</figDesc><graphic coords="9,264.45,556.46,86.46,69.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Stocks Table</figDesc><table><row><cell># Code Company</cell><cell>Sector</cell></row><row><cell>1 AA Alcoa Inc.</cell><cell>Basic Materials</cell></row><row><cell>2 AIG American International Group, Inc.</cell><cell>Financial</cell></row><row><cell>3 AAPL Apple Inc.</cell><cell>Technology</cell></row><row><cell>4 AXP American Express Company</cell><cell>Financial</cell></row><row><cell>5 BA Boeing Company</cell><cell>Industrial Goods</cell></row><row><cell>6 CAT Caterpillar, Inc.</cell><cell>Industrial Goods</cell></row><row><cell>7 DD E.I. du Pont de Nemours and Company</cell><cell>Basic Materials</cell></row><row><cell>8 DIS Walt Disney Company</cell><cell>Services</cell></row><row><cell>9 GE General Electrics Company</cell><cell>Conglomerates</cell></row><row><cell>10 HD Home Depot, Inc.</cell><cell>Services</cell></row><row><cell>11 HON Honeywell International Inc.</cell><cell>Industrial Goods</cell></row><row><cell>12 HPQ HP Inc.</cell><cell>Technology</cell></row><row><cell cols="2">13 IBM International Business Machines Corporation Technology</cell></row><row><cell>14 INTC Intel Corporation</cell><cell>Technology</cell></row><row><cell>15 JNJ Johnson &amp; Johnson</cell><cell>Healthcare</cell></row><row><cell>16 JPM JP Morgan Chase &amp; Co.</cell><cell>Financial</cell></row><row><cell>17 KO Coca-Cola Company</cell><cell>Consumer Goods</cell></row><row><cell>18 MCD McDonald's Corporation</cell><cell>Services</cell></row><row><cell>19 MSFT Microsoft Corporation</cell><cell>Technology</cell></row><row><cell>20 MMM 3M Company</cell><cell>Conglomerates</cell></row><row><cell>21 MO Altria Group, Inc.</cell><cell>Consumer Goods</cell></row><row><cell>22 PFE Pfizer, Inc.</cell><cell>Healthcare</cell></row><row><cell>23 PG Procter &amp; Gamble Company</cell><cell>Consumer Goods</cell></row><row><cell>24 UTX United Technologies Corporation</cell><cell>Conglomerates</cell></row><row><cell>25 VZ Verizon Communications Inc.</cell><cell>Technology</cell></row><row><cell>26 WMT Wal-Mart Stores, Inc.</cell><cell>Services</cell></row><row><cell>27 ABIO ARCA biopharma, Inc.</cell><cell>Healthcare</cell></row><row><cell>28 AMGN Amgen Inc.</cell><cell>Healthcare</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Numerical results</figDesc><table><row><cell>NMF</cell><cell></cell><cell>C-NMF</cell><cell></cell><cell>CH-NMF</cell><cell></cell></row><row><cell cols="6">k # iter error # clusters # iter error # clusters # iter error # clusters</cell></row><row><cell>3 1528 33.7703</cell><cell>3</cell><cell>500 45.7185</cell><cell>2</cell><cell>1 47.5844</cell><cell>2</cell></row><row><cell>4 2355 27.1966</cell><cell>4</cell><cell>500 42.4148</cell><cell>2</cell><cell>1 43.9824</cell><cell>4</cell></row><row><cell>5 3358 21.0838</cell><cell>5</cell><cell>500 40.2502</cell><cell>2</cell><cell>1 38.6585</cell><cell>4</cell></row><row><cell>6 2523 16.9987</cell><cell>4</cell><cell>500 33.5761</cell><cell>2</cell><cell>1 56.4050</cell><cell>4</cell></row><row><cell>7 5000 14.6706</cell><cell>6</cell><cell>500 38.6675</cell><cell>2</cell><cell>1 32.5755</cell><cell>5</cell></row><row><cell>8 5000 13.5482</cell><cell>7</cell><cell>500 32.1786</cell><cell>2</cell><cell>1 46.2535</cell><cell>5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>NMF Clusters</figDesc><table><row><cell>k = 3</cell><cell>k = 4</cell><cell>k = 5</cell><cell>k = 6</cell><cell>k = 7</cell><cell>k = 8</cell></row><row><cell cols="3">3 4 5 6 3 5 8 10 12 14 15</cell><cell cols="3">3 4 5 6 3 4 5 6 11 15 17 18</cell></row><row><cell cols="6">7 8 9 10 11 15 16 16 17 18 7 8 10 11 7 8 14 16 20 21 22 23</cell></row><row><cell cols="6">11 14 15 16 19 20 21 19 20 23 13 14 15 16 19 20 24 24 25 26 28</cell></row><row><cell cols="4">17 19 20 21 22 23 24 24 25 26 17 18 19 20</cell><cell></cell><cell></cell></row><row><cell cols="2">22 23 24 25 25 26 28</cell><cell></cell><cell>21 22 24 25</cell><cell></cell><cell></cell></row><row><cell>26 28</cell><cell></cell><cell></cell><cell>28</cell><cell></cell><cell></cell></row><row><cell cols="3">12 13 18 6 7 12 13 4 5 8 10</cell><cell cols="2">23 26 12 15 23</cell><cell>2 4 5 8</cell></row><row><cell></cell><cell cols="2">14 17 18 21 22 28</cell><cell></cell><cell>26</cell><cell>9 10</cell></row><row><cell>1 2 27</cell><cell>27</cell><cell>7 11</cell><cell>1 2 9 27</cell><cell>21 28</cell><cell>27</cell></row><row><cell></cell><cell cols="2">1 2 4 9 3 6 13</cell><cell cols="2">12 10 13 17</cell><cell>16</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>18 22 25</cell><cell></cell></row><row><cell></cell><cell></cell><cell>1 2 9 27</cell><cell></cell><cell>27</cell><cell>3 13 14</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>1 2 9 11</cell><cell>1 6 7 12</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>19</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>C-NMF Clusters They correspond respectively to Alcoa Inc. (Basic Materials), American International Group Inc. (Financial), General Electrics Company (Conglomerates), ARCA biopharma Inc. (Healthcare), HP Inc. (Technology). Other stocks which are often grouped together are depicted in the Tables with different colours.</figDesc><table><row><cell>k = 3</cell><cell>k = 4</cell><cell>k = 5</cell><cell>k = 6</cell><cell>k = 7</cell><cell>k = 8</cell></row><row><cell cols="3">1 2 12 27 1 2 9 12 1 2 9 12</cell><cell cols="3">1 2 27 1 2 6 9 1 2 9 12 27</cell></row><row><cell></cell><cell>27</cell><cell>27</cell><cell></cell><cell>12 23 27</cell><cell></cell></row><row><cell cols="3">3 4 5 6 7 3 4 5 6 7 3 4 5 6 7</cell><cell cols="3">3 4 5 6 7 3 4 5 7 8 3 4 5 6 7</cell></row><row><cell cols="5">8 9 10 11 8 10 11 8 10 11 8 9 10 11 10 11 13</cell><cell>8 10 11</cell></row><row><cell cols="5">13 14 15 13 14 15 13 14 15 12 13 14 15 14 15 16</cell><cell>13 14 15</cell></row><row><cell cols="5">16 17 18 16 17 18 16 17 18 16 17 18 19 17 18 19</cell><cell>16 17 18</cell></row><row><cell cols="5">19 20 21 19 20 21 19 20 21 20 21 22 23 20 21 22</cell><cell>19 20 21</cell></row><row><cell cols="5">22 23 24 22 23 24 22 23 24 24 25 26 28 24 25 26</cell><cell>22 23 24</cell></row><row><cell cols="3">25 26 28 25 26 28 25 26 28</cell><cell></cell><cell>28</cell><cell>25 26 28</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>CH-NMF Clusters</figDesc><table><row><cell>k = 3</cell><cell>k = 4</cell><cell>k = 5</cell><cell>k = 6</cell><cell>k = 7</cell><cell>k = 8</cell></row><row><cell>1 2 9 12 27</cell><cell>1 2 9 27</cell><cell cols="2">1 2 9 27 1 2 9 12 27</cell><cell>1 2 9</cell><cell>2 27</cell></row><row><cell>3 4 5 6 7 8</cell><cell cols="2">3 4 5 7 8 3 5 7 8 10</cell><cell>3 4 5 7 8</cell><cell cols="2">3 4 6 7 4 8 10 11</cell></row><row><cell cols="2">9 10 11 13 14 10 11 14 15</cell><cell cols="2">11 18 21 11 14 15 16</cell><cell cols="2">13 14 17 13 15 16 17</cell></row><row><cell cols="2">15 16 17 18 19 16 17 18 19</cell><cell cols="2">22 24 25 17 20 22 23</cell><cell cols="2">20 23 24 18 20 21 22</cell></row><row><cell cols="2">20 21 22 23 24 20 21 22 23</cell><cell></cell><cell>24 25 26 28</cell><cell cols="2">25 26 23 24 25 26</cell></row><row><cell cols="2">25 26 28 24 25 26 28</cell><cell></cell><cell></cell><cell></cell><cell>28</cell></row><row><cell></cell><cell cols="2">6 13 4 6 12 14</cell><cell cols="2">6 13 5 8 10 11</cell><cell>1 5 9 12</cell></row><row><cell></cell><cell></cell><cell>15 16 17 19</cell><cell></cell><cell>15 16 18</cell><cell>14 19</cell></row><row><cell></cell><cell></cell><cell>20 23 26 28</cell><cell></cell><cell>19 21 22 28</cell><cell></cell></row><row><cell></cell><cell>12</cell><cell cols="2">13 10 18 19 21</cell><cell>12</cell><cell>6 7</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>27</cell><cell>3</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was partially funded by the Italian PON 2007-2013 project PON02_00563_3489339 'Puglia@Service'.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Clustering approaches for financial data analysis: a</title>
		<author>
			<persName><forename type="first">F</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Le-Khac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kechadi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Data Mining (DMIN)</title>
				<meeting>the International Conference on Data Mining (DMIN)</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Portfolio diversification using subspace factorizations</title>
		<author>
			<persName><forename type="first">R</forename><surname>De Fréin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Drakakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rickard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CISS 2008. 42nd Annual Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page" from="1075" to="1080" />
		</imprint>
	</monogr>
	<note>Information Sciences and Systems</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Convex and semi-nonnegative matrix factorizations</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H Q</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="45" to="55" />
			<date type="published" when="2010-01">Jan 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Analysis of financial data using non-negative matrix factorization</title>
		<author>
			<persName><forename type="first">K</forename><surname>Drakakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rickard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>De Fréin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cichocki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Mathematical Forum</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">38</biblScope>
			<biblScope unit="page" from="1853" to="1870" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Option pricing and portfolio optimization</title>
		<author>
			<persName><forename type="first">R</forename><surname>Korn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Korn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Graduate Studies in Mathematics</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page">18</biblScope>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Algorithms for non-negative matrix factorization</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Seung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="556" to="562" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Non-negative matrix factorization for stock market pricing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Biomedical Engineering and Informatics. BMEI&apos;09</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Some methods for classification and analysis of multivariate observations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Macqueen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fifth Berkeley symposium on mathematical statistics and probability</title>
				<meeting>the fifth Berkeley symposium on mathematical statistics and probability<address><addrLine>Oakland, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1967">1967</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="281" to="297" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Portfolio selection</title>
		<author>
			<persName><forename type="first">H</forename><surname>Markowitz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The journal of finance</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="77" to="91" />
			<date type="published" when="1952">1952</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Yes we can: simplex volume maximization for descriptive web-scale matrix factorization</title>
		<author>
			<persName><forename type="first">C</forename><surname>Thurau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kersting</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bauckhage</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information and knowledge management. CIKM</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="1785" to="1788" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Stock trend extraction via matrix factorization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advanced Data Mining and Applications</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="516" to="526" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
