=Paper= {{Paper |id=Vol-1670/paper-23 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1670/paper-23.pdf |volume=Vol-1670 }} ==None== https://ceur-ws.org/Vol-1670/paper-23.pdf
           Correlated Variable Selection in
      High-dimensional Linear Models using Dual
                Polytope Projection

                    Niharika Gauraha and Swapan K. Parui

                         Indian Statistical Institute, India
    We consider the case of high dimensional linear models (p  n) with strong
empirical correlation among variables. The Lasso is a widely used regularized
regression method for variable selection, but it tends to select a single variable
from a group of strongly correlated variables even if many or all of these variables
are important. In many situations, it is desirable to identify all the relevant
correlated variables, examples include micro-array analysis and genome-wide
association studies. We propose to use Dual Polytope Projections (DPP) rule,
for selecting the relevant correlated variables which are not selected by the Lasso.
    We consider the usual linear model setup, that is given as Y = Xβ + . Let
λ ≥ 0 be a regularization
                           parameter. Thenthe Lasso estimator(see [1]) is defined
                      1
as: β̂(λ) = minp        kY − Xβk|22 + λkβk1 . Let λmax = max |XTj Y|, then for
              β∈R     2                                      1≤j≤p
all λ ∈ [λmax , ∞), we have β̂(λ) = 0. It has been shown that the screening
methods based on DPP rule are highly effective in reducing the dimensionality
by discarding the irrelevant variables (see [2]). Suppose we want to compute
Lasso solution for a λ ∈ (0, λmax ), the (global strong) DPP rule discards the j th
variable whenever |XTj Y| < 2λ − λmax (variables having smaller inner products
with the response).
    Exploiting the above property, we propose a two-stage procedure for variable
selection. At the first stage, we perform Lasso using cross-validation and we
choose the regularization parameter λLasso , that optimizes the prediction. At
the second stage, we select all the variables for which |XTj Y| ≥ 2λLasso − λmax .
Though, the Lasso solution at λLasso does not include all the relevant correlated
variables, but these correlated variables have the similar magnitude for their
inner products with the response. Hence, all the relevant correlated predictors
also get selected at the second stage.

Keywords: Correlated Variable Selection, Lasso, Dual Polytope Projection,
High-dimensional Data Analysis

References
[1] Robert Tibshirani. “Regression shrinkage and selection via the lasso”. In:
    J. R. Statist. Soc 58 (1996), 267–288.
[2] Jie Wang et al. “Lasso Screening Rules via Dual Polytope Projection”. In:
    NIPS (2013).