<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Correlated Variable Selection in High-dimensional Linear Models using Dual Polytope Pro jection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Niharika Gauraha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Swapan K. Parui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Statistical Institute</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>[1] Robert Tibshirani. \Regression shrinkage and selection via the lasso". In: J. R. Statist. Soc 58 (1996), 267{288. [2] Jie Wang et al. \Lasso Screening Rules via Dual Polytope Projection". In: NIPS (2013).</p>
      </abstract>
      <kwd-group>
        <kwd>Correlated Variable Selection</kwd>
        <kwd>Lasso</kwd>
        <kwd>Dual Polytope Projection</kwd>
        <kwd>High-dimensional Data Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We consider the case of high dimensional linear models (p n) with strong
empirical correlation among variables. The Lasso is a widely used regularized
regression method for variable selection, but it tends to select a single variable
from a group of strongly correlated variables even if many or all of these variables
are important. In many situations, it is desirable to identify all the relevant
correlated variables, examples include micro-array analysis and genome-wide
association studies. We propose to use Dual Polytope Projections (DPP) rule,
for selecting the relevant correlated variables which are not selected by the Lasso.</p>
      <p>We consider the usual linear model setup, that is given as Y = X + : Let
0 be a regularization parameter. Then the Lasso estimator(see [1]) is de ned
1
as: ^( ) = m2iRnp 2 kY X kj22 + k k1 : Let max = 1mjaxp jXjT Yj, then for
all 2 [ max; 1), we have ^( ) = 0. It has been shown that the screening
methods based on DPP rule are highly e ective in reducing the dimensionality
by discarding the irrelevant variables (see [2]). Suppose we want to compute
Lasso solution for a 2 (0; max), the (global strong) DPP rule discards the jth
variable whenever jXjT Yj &lt; 2 max (variables having smaller inner products
with the response).</p>
      <p>Exploiting the above property, we propose a two-stage procedure for variable
selection. At the rst stage, we perform Lasso using cross-validation and we
choose the regularization parameter Lasso, that optimizes the prediction. At
the second stage, we select all the variables for which jXjT Yj 2 Lasso max:
Though, the Lasso solution at Lasso does not include all the relevant correlated
variables, but these correlated variables have the similar magnitude for their
inner products with the response. Hence, all the relevant correlated predictors
also get selected at the second stage.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>