<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">From Covariance to Comode in context of Principal Component Analysis</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Daniyal</forename><surname>Kazempour</surname></persName>
							<email>kazempour@dbs.ifi.lmu.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Ludwig-Maximilians-Universität München</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mathias</forename><surname>Long</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ludwig-Maximilians-Universität München</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thomas</forename><surname>Yan</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ludwig-Maximilians-Universität München</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><surname>Seidl</surname></persName>
							<email>seidl@dbs.ifi.lmu.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Ludwig-Maximilians-Universität München</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">From Covariance to Comode in context of Principal Component Analysis</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C441779190EB221E1A3A2BB07F6E23E8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Covariance</term>
					<term>Comode</term>
					<term>Principal Component Analysis</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>When it comes to the task of dimensionality reduction, the Principal Component Analysis (PCA) is among the most well known methods. Despite its popularity, PCA is prone to outliers which can be traced back to the fact that this method relies on a covariance matrix. Even with the variety of sophisticated methods to enhance the robustness of the PCA, we provide here in this work-in-progress an approach which is intriguingly simple: the covariance matrix is replaced by a socalled comode matrix. Through this minor modification the experiments show that the reconstruction loss is significantly reduced. In this work we introduce the comode and its relation to the MeanShift algorithm, including its bandwidth parameter, compare it in an experiment against the classic covariance matrix and evaluate the impact of the bandwidth hyperparameter on the reconstruction error.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In cases where we have to deal with high-dimensional data, a common strategy is to perform a Principal Component Analysis (PCA) <ref type="bibr" target="#b4">[5]</ref>. A PCA yields eigenpairs consisting of eigenvectors and their corresponding eigenvalues. In the use-case of dimensionality reduction, projecting given data down to the eigenvectors with the top-k eigenvalues, comes with a reconstruction error when projecting it back to the full dimensional data. This reconstruction error decreases if more principal components are incorporated. Nevertheless, despite using more principal components, the reconstruction error can still be significantly high. Outliers can heavily skew the results which is originated in the mere observation that an outlier can skew the mean for each of the features of a data set. More robust measures of central tendency are the median and the mode. In this work we propose the so-called comode matrix as an alternative to the covariance matrix on which the eigenvalues and eigenvectors are computed in a PCA. Our contribution in this work-in-progress shows that it is more robust towards outliers, but at the same time dependant on the choice of a hyperparameter known as bandwidth.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Many efforts have been made to make the PCA more robust towards noise. As such in the work of <ref type="bibr" target="#b5">[6]</ref> the authors describe a robust M-estimation algorithm for capturing multivariate representations of high-dimensional data exemplary on images, known as Robust Principal Component Analysis. The authors elaborate that methods such as RANSAC <ref type="bibr" target="#b2">[3]</ref> and Least Median Squares would be more robust compared to M-estimation yet remain unclear in their application on high-dimensional data. In a classical approach by Candes et al. <ref type="bibr" target="#b0">[1]</ref> the outliers are modeled by a sparse matrix based on the assumption that the data matrix can be expressed in terms of a sum of a low rank and a sparse matrix. The so far mentioned methods rely on complex and sophisticated methods. We challenge the task of robust PCA by asking: What if we simply replace the covariance matrix by a comode matrix? Since the mode is insensitive against noise, so should be the comode according to our hypothesis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Comode</head><p>Given a data matrix D where each of its rows represents a data record and its columns represent the features (A 1 , ..., A d ). When performing a PCA, first a covariance matrix Σ is computed. The covariance is a generalization of the variance. For computing the covariance from every feature, the expected value E (mean) is subtracted. Since the mean is sensitive towards outliers, the computation of the covariance matrix is also sensitive against outliers. A more robust measure is the mode. Generalizing the mode like variance is generalized to the covariance leads to the comode:</p><formula xml:id="formula_0">com(A i , A j ) := mode((A i − mode(A i ))(A j − mode(A j )))</formula><p>Building on the definition of the comode, we define a comode matrix Ω:</p><formula xml:id="formula_1">Ω D =    mode(A 1 ) • • • com(A 1 , A d ) . . . . . . . . . com(A d , A 1 ) • • • mode(A d )   </formula><p>But how do we actually compute com(A i , A j ) which represents the mode of a variable A i in dependence of another variable A j ? One solution we propose here relies on the so called MeanShift <ref type="bibr" target="#b1">[2]</ref> algorithm. MeanShift is a method which was primarily used for locating the maxima, which are basically the modes of a given density function. It is well known as being a mode-seeking algorithm and its applications have been extended to tasks like cluster analysis. For any given data object x the shifted data object at one iteration is computed as:</p><formula xml:id="formula_2">m(x) = N (x) i=1 K(x − x i )x i N (x) i=1 K(x − x i ) (1)</formula><p>where K(X) = k x σ<ref type="foot" target="#foot_1">2</ref> defines a kernel and N(x) are all objects in the neighborhood of x within a bandwidth σ. The bandwidth follows the intuition of an -range. It defines the local neighborhood of an data object x. The mean shif t of x is the difference m(x) − x. At this point it may still be unclear how the MeanShift may lead to the detection of the comodes? A comode com(A i , A j ) is determined by computing the MeanShift µ shif t (D Ai,Aj ) on a given dataset D considering only its features (= random variables) A i and A j .</p><p>The comode matrix can thus be expressed in terms of MeanShift computations:</p><formula xml:id="formula_3">com(A i , A j ) := mode((A i − mode(A i ))(A j − mode(A j ))) = µ shif t ((A i − µ shif t (D Ai,Aj ))(A j − µ shif t (D Ai,Aj )))</formula><p>What remains unclear so far is: how do we approach the fact that there can be not only one (unimodal) but several (multimodal) modes? In this work-inprogress we have focused on taking the mode with the highest frequency, meaning we take the top mode which has the highest number of objects belonging to it. In the case where we have several equally high top-modes, we randomly select one of them. Due to the brevity of this paper, we pursue it in future work to elaborate and evaluate the impact of choosing different top-modes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>In our experiments we use the MeanShift with a flat kernel from sklearn<ref type="foot" target="#foot_0">1</ref> . For reproducibility purposes, the code is made available on github 2 . We conduct our experiments on the iris dataset<ref type="foot" target="#foot_2">3</ref> which consists of 150 instances and 4 features. We compute the PCA using the comode instead of the covariance matrix. We discard all principal components from the eigenvector matrix U except the first one, and project the data D down to its lower (one)dimensional representation Y through: Y = D • U k=i . We reconstruct the data by projecting it back to its original full-dimensional representation through: Z = Y • U k=i . By projecting the data down and back again, enables us to compute the Mean Squared Error (MSE) which is defined as</p><formula xml:id="formula_4">M SE(D, Z) = n i=1 (di−zi) 2 n</formula><p>where d i ∈ D, z i ∈ Z and n denoted the number of objects for which holds n = |D| = |Z|. We apply this procedure for taking the second, third,...,d-th principal component. Since we want to investigate the effect of the choice of the bandwidth we further perform the comode computation choosing the bandwidths σ = (0.1, 4.0, 5.5). The results can be seen in Figure <ref type="figure">4</ref> where the horizontal axis represents the principal components in descending order of their corresponding eigenvalues. The vertical axis represents the MSE. It can be seen from Figure <ref type="figure">4</ref> that PCA with comode and a bandwidth of σ = 0.1 yields an MSE (∼ 1) which is significantly below that of the classical PCA with a covariance matrix (∼ 7) taking only the first principal component. By taking additionally the second principal component the MSE of the covariance-based PCA drops even slightly below the MSE of the comode variant with σ = 0.1 which holds for taking three principal components as well. A larger bandwidth of σ = 4.0 yields to MSE values which exhibit a course similar to that of the covariance-based PCA but with overall higher MSE values. At this point it is interesting for further research to investigate if there is an 'even'-point regarding the bandwidth where the MSE for different number of principal components equals the MSE of the covariance-based PCA. A bandwidth of σ = 5.5 yields MSE results which are by far worse compared to the other settings.</p><p>However, we have to take the MSE results with a grain of salt for the following observation which has been made in <ref type="bibr" target="#b3">[4]</ref>. Here the insights were that a low MSE (or in that work: MAE) does not imply a high robustness towards outlier. In fact, the opposite is the case: a low MSE indicates that even outliers can be reconstructed well, which in turn means that the computed principal component is distorted by the outlier. In contrast, a high MSE can indicate that a principal component has been computed which does not take outliers into account, giving therefore more weight to the majority of the objects following a direction with largest variance. As we stated a high MSE can indicate a better principal component. For this purpose in future work it is vital do develop methods by which the reconstruction without the outliers is computed. With the bandwidth we have a control unit to tune the comode computation such that either the MSE is minimized, or the overall deviation from the principal component is minimized. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>In this work-in-progress we have presented the Comode in context of PCA. A first experiment showed promising results, outperforming a covariance-based PCA while being intriguingly simple. There are however interesting aspects which demand further research: first and foremost, it remains an open question on how to deal with multimodal cases. What are the effects on the PCA if we choose the second or third strongest mode(s)? How are good bandwidths determined? For these aspects we may seek previous works which aimed at estimating good bandwidths for MeanShift. Further it is of interest to investigate if featurespecific bandwidths have any impact on the robustness of the Comode-based PCA. As for now, we have one bandwidth which is valid for all features, neglecting feature-specific bandwidths. We hope to stimulate further research on the Comode, revealing its limitations as well as its potentials.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. MSE with increasing number of principal components on the iris data set. orange: comode with σ = 0.1; green: comode with σ = 4.0; red: comode with σ = 5.5; blue: covariance variant</figDesc><graphic coords="4,212.58,458.26,190.21,128.19" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/hamilton-function/comode</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://archive.ics.uci.edu/ml/datasets/iris</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgement</head><p>This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Robust principal component analysis?</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Candès</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wright</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the ACM</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">11</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note>JACM)</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Mean shift, mode seeking, and clustering</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on pattern analysis and machine intelligence</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="790" to="799" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Fischler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Bolles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="381" to="395" />
			<date type="published" when="1981">1981</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">On coMADs and Principal Component Analysis</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kazempour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A X</forename><surname>Hünemörder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Seidl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">-Springer Lecture Notes in Computer Science</title>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
	<note type="report_type">SISAP</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Liii. on lines and planes of closest fit to systems of points in space</title>
		<author>
			<persName><forename type="first">K</forename><surname>Pearson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="559" to="572" />
			<date type="published" when="1901">1901</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Robust principal component analysis for computer vision</title>
		<author>
			<persName><forename type="first">F</forename><surname>De La Torre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Black</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings Eighth IEEE International Conference on Computer Vision. ICCV</title>
				<meeting>Eighth IEEE International Conference on Computer Vision. ICCV</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2001">2001. 2001</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="362" to="369" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
