=Paper=
{{Paper
|id=Vol-1623/papermp13
|storemode=property
|title=Dual Greedy Algorithm for Conic Optimization Problem
|pdfUrl=https://ceur-ws.org/Vol-1623/papermp13.pdf
|volume=Vol-1623
|authors=Sergei Sidorov, Sergei Mironov, Michael Pleshakov
|dblpUrl=https://dblp.org/rec/conf/door/SidorovMP16
}}
==Dual Greedy Algorithm for Conic Optimization Problem==
<pdf width="1500px">https://ceur-ws.org/Vol-1623/papermp13.pdf</pdf>
<pre>
                         Dual Greedy Algorithm
                     for Conic Optimization Problem
                       S. P. Sidorov, S. V. Mironov, M. G. Pleshakov⋆

                          Saratov State University, Russian Federation
                                      http://www.sgu.ru


        Abstract.   In the paper we propose an algorithm for nding approximate sparse
        solutions of convex optimization problem with conic constraints and examine
        convergence properties of the algorithm with application to the index tracking
        problem and unconstrained l1 -penalized regression.

        Keywords:    greedy algorithm; constrained convex optimization; conic optimiza-
        tion problem; index tracking problem.


1     Introduction
Greedy algorithms have been intensively studied since the 80s of the last century, and
their main credit consists in obtaining constructive methods for nding the best m-
term approximations. The main contribution to the development of greedy algorithms
was made by J. Friedman [1], S. Mallat [2], J. Zhang [3], P. Huber [4], L. Jones Jones,
A. Barron [6], R. DeVore, V.N. Temlyakov [7], S.V. Konyagin [8] and others.
    Let X be a Banach space with norm ∥ · ∥X . A set of elements D from the space X
is called a dictionary if each element g ∈ D
  has norm bounded by one, ∥g∥ ≤ 1;
  the closure of span D is X , i.e. spanD = X .
A dictionary D is called symmetric if −g ∈ D for every g ∈ D. In this paper we assume
that the dictionary D is symmetric.
    Let E be a convex function dened on X . The problem of convex optimization is
to nd an approximate solution to the problem

                                            E(x) → min .                                        (1)
                                                      x∈X

   In many applications it is necessary to nd solutions of the problem (1), that are
sparse with respect to D, i.e. we are interested in solving the following problem:

                                         E(x) →        inf     ,                                (2)
                                                    x∈Σm (D)

⋆
    This work was supported in the Russian Fund for Basic Research under Grant 14-01-00140.
Copyright ⃝
          c by the paper's authors. Copying permitted for private and academic purposes.

In: A. Kononov et al. (eds.): DOOR 2016, Vladivostok, Russia, published at http://ceur-ws.org
                          Dual Greedy Algorithm for Conic Optimization Problem       277

where Σm (D) is the set of all m-term polynomials with respect to D:
                               {                               }
                                             ∑m
                    Σm (D) = x ∈ X : x =         ci gi , gi ∈ D .                    (3)
                                                 i=1

    The paper of V.N.Temlyakov [9] examines greedy algorithms for nding an approx-
imate solution to the problem (2). It is shown that greedy algorithms with respect to
the dictionary D solve the problem (1) as well.
    The paper [9] proposes the weak greedy Chebyshev algorithm and proves some
estimates of the convergence rate of the algorithm based on the geometric properties
of the function E . The development of ideas presented in [9] can be found in [10], [11].
    Let Y be the m-dimensional Euclidean space, Y = Rm . Let E be a convex function
dened on X . Many convex constrained optimization problems can be expressed in the
conic form:
                                 E(x) → min ,                                         (4)
                                             A(x)+b∈K

where A : X → Y is a linear operator, b ∈ Y and K is a closed cone in Y .
     Recall that K is said to be a cone in Y if a1 x1 + a2 x2 ∈ K for any x1 , x2 ∈ K and
a1 , a2 ∈ [0, ∞).
     At rst sight the constraint A(x) + b ∈ K seems to be too specialized, but as it
is pointed out in [12], any convex subset of Y = Rm may be represented in the conic
form. In particular, as it was shown in [12] both Dantzig selector problem [13] and
LASSO regression problem [14] can be reformulated in the form (4).
     We are interested in nding an approximate solution to the problem (4). Further-
more, in many applications it is necessary to nd solutions of (4) that are sparse with
respect to the dictionary D:
                              E(x) →           inf          .                        (5)
                                       x∈Σm (D), A(x)+b∈K

   In the paper we propose the dual weak greedy Chebyshev algorithm for nding
approximate sparse solutions of convex optimization problem with conic constraint
and examine convergence properties of the algorithm.

2   Dual weak greedy Chebyshev algorithm
It may turn out that the solution to the problem (4) can be inecient from the com-
putational point of view since projection onto the set {x : A(x) + b ∈ K} may be
expensive, as well as the nding of a single feasible point. For example, projection onto
the set {x : ∥y − Ax∥2 ≤ ϵ}, x ∈ Rn , y ∈ Rm , is very computational expensive, while
projection onto the dual cone is trivial.
   The dual problem of the problem (4) is
                                     F (λ) → max∗ ,                                  (6)
                                               λ∈K

where F (λ) is the Lagrange dual function:
                   F (λ) := inf L(x, λ) = inf (E(x) − ⟨λ, A(x) + b⟩),
                             x             x
278       S. P. Sidorov, S. V. Mironov, M. G. Pleshakov

and K ∗ is the dual cone,

                           K ∗ := {λ ∈ Y : ⟨λ, y⟩ ≥ 0 ∀y ∈ K}.

   Let A∗ : Rm → Rn denote the adjoint of the linear operator A. Let E ∗ be the
convex conjugate of E , i.e.

                                    E ∗ (z) = sup(⟨z, x⟩ − E(x)).
                                                  x

Then the dual problem (6) can be rewritten in the following form:

                                   −E ∗ (A∗ (λ)) − ⟨b, λ⟩ → max∗ .
                                                                       λ∈K

      Recall that the following inequality holds for any optimal primal/dual pair x and
λ:

 E(x) − F (λ) = E(x) + E ∗ (A∗ (λ)) + ⟨b, λ⟩ ≥ ⟨x, A∗ (λ)⟩ + ⟨b, λ⟩ = ⟨A(x) + b, λ⟩ ≥ 0.

If the optimal solutions of the primal problem x and the dual problem λ are strictly
feasible then E(x) = F (λ) and there exist (not necessary unique) points x∗ , λ∗ , such
that E(x∗ ) = F (λ∗ ) = L(x∗ , λ∗ ) and satisfying optimality conditions

             A(x∗ ) + b ∈ K, λ∗ ∈ K ∗ , ⟨A(x∗ ) + b, λ∗ ⟩ = 0, A∗ (λ∗ ) ∈ ∂E(x∗ ),            (7)

where ∂E is the subgradient of E .
   Let τ := {tm }∞                                               ∗
                 m=1 , tm ∈ [0, 1], be a weakness sequence. Let λ satisfy (7).
   We suppose that function E is Frechet dierentiable. We note that it follows from
convexity of E that for any x, y

                                   E(y) ≥ E(x) + ⟨E ′ (x), y − x⟩,

where E ′ (x) denotes Frechet dierential of E at x. Denote

                                   Q(x) := E(x) − ⟨A(x) + b, λ∗ ⟩.

   Dual weak greedy Chebyshev algorithm. Let G0 := 0. For each m ≥ 1 we
dene Gm by induction as follows.
1. (Gradient greedy step) Find the element ϕm ∈ D satisfying
                          ⟨−Q′ (Gm−1 ), ϕm ⟩ ≥ tm sup ⟨−Q′ (Gm−1 ), g⟩.
                                                                 g∈D

2. (Chebyshev-type step) Find real numbers c∗i , i = 1, . . . , m, such that
                                       (m             )             (m                )
                                        ∑                            ∑
                                   Q         c∗i ϕi       = inf Q            c i ϕi       .
                                                            ci
                                       i=1                             i=1
                 ∑m
3. Let Gm =             ∗
                   i=1 ci ϕi   .
                               Dual Greedy Algorithm for Conic Optimization Problem              279

    The gradient greedy step maximizes a certain functional determined by gradient
information from the previous steps of the algorithm. The Chebyshev-type step nds
the best linear combination of m approximants {ϕi }m
                                                   i=1 .
    Let Ω := {x ∈ X : E(x) ≤ E(0)} and suppose that Ω is bounded. The modulus of
smoothness of function E on the bounded set Ω is dened as follows:
                               1
                 ρ(E, u) =          sup |E(x + uy) + E(x − uy) − 2E(x)|.                         (8)
                               2 x∈Ω,∥y∥=1

      Let A1 (D) denote the closure (in X) of the convex hull of D.
      Using results and ideas of [9] we can prove the following proposition.
Theorem 1. Let E be a uniformly smooth convex function with modulus of smoothness
ρ(E, u) ≤ γuq , 1 < q ≤ 2. Let V := {x ∈ X : A(x) + b ∈ K}. Let ϵ > 0 and element
f ϵ ∈ D be such that
                            Q(f ϵ ) ≤ inf Q(x) + ϵ, f ϵ /C(ϵ) ∈ A1 (D),
                                      x∈D

for some real number C(ϵ) ≥ 1. Let p = q/(q − 1). Then

    E(Gm ) −         inf     E(x) ≤
               x∈Σm (D)∩V
                                                  (                               )1−q 
                                                                        ∑
                                                                        m
                           ≤ max 2ϵ, C(q, γ)A(ϵ) C(E, A, b, K, q, γ) +   tpk            , (9)
                                                                            k=1

where C(q, γ), C(E, A, b, K, q, γ) are positive constants not depending on k.
Proof. We have ρ(Q, u) = ρ(E, u). It follows from Theorem 4.2 of [9] that
                                                        (                   )1−q 
                                                             ∑
                                                             m
       Q(Gm ) − inf Q(x) ≤ max 2ϵ, C(q, γ)A(ϵ) C(Q, q, γ) +   tpk                      .      (10)
                 x∈D
                                                                      k=1

Then (9) follows from (10) and

    Q(Gm ) − inf Q(x) = E(Gm ) − ⟨A(Gm ) + b, λ∗ ⟩ − inf (E(x) − ⟨A(x) + b, λ∗ ⟩) ≥
               x∈Ω                                           x∈Ω
               ≥ E(Gm ) − ⟨A(Gm ) + b, λ ⟩ − inf (E(x) − ⟨A(Gm ) + b, λ∗ ⟩) =
                                            ∗
                                                 x∈Ω
                                       = E(Gm ) − inf E(x) ≥ E(Gm ) −             inf        E(x).
                                                   x∈Ω                      x∈Σm (D)∩V

                                                                                                     ⊔
                                                                                                     ⊓


3      Applications
3.1     Index tracking problem
Greedy algorithms showed an excellent performance in solution of practical problems of
machine learning and optimization. This section will show the use of such techniques in
280     S. P. Sidorov, S. V. Mironov, M. G. Pleshakov

solving the index tracking problem. Index tracking is a passive nancial strategy that
tries to replicate the performance of a given index or benchmark. The aim of investor
is to nd the weights of assets in her/his portfolio that minimize the tracking error,
i.e. dierence between the performance of the index and the portfolio.
                                                                   ∑n            1/q
     For any q > 0 and x = (x1 , . . . , xn )T ∈ Rn , let ∥x∥q := ( i=1 |xi |q )     and ∥x∥0 =
limq→0+ ∥x∥q = (the number of non-zero elements of x). If q ≥ 1 then ∥x∥q denotes
lq -norm of x ∈ Rn . Let n be the number of investable assets. Denote rti the return of
asset i at time t, 1 ≤ i ≤ n, 1 ≤ t ≤ k , R = (rti ) is the k × n matrix. A portfolio is
dened to be a vector of weights, x = (x1 , . . . , xn )T ∈ Rn . We do not allow the portfolio
changes over time and do not take into account transaction costs. We will assume that
 1. one unit of capital is available, i.e. xT 1n = 1, where 1n denotes the vector from Rn
    in which every component is equal to 1;
 2. short selling is allowed, i.e. weights xi can be negative.
   Let It be the index return at time t, 1 ≤ t ≤ k , and I = (I1 , . . . , Ik )T ∈ Rk . In
the traditional index tracking optimization, the objective is to nd a portfolio which
has minimal tracking error variance, the sum of squared deviations between portfolio
returns and market index returns (see e.g. [15]):
                                     1
                       x∗ = arg min ∥I − Rx∥22 s.t. xT 1n = 1.                        (11)
                                     k
    It should be noted that the standard Markovitz model is a special case of index
tracking portfolio model (11) (see, for example [16], [17]). Since the problem (11) is the
problem of convex optimization, it can be easily solved by Lagrange method.
    In this section we examine algorithm for solving the problem (11) with the cardi-
nality constraint:
                              1
                  x∗ = arg min ∥I − Rx∥22 s.t. xT 1n = 1, ∥x∥0 ≤ K,                       (12)
                              k
where K is the limit on the number of assets in the portfolio with non-zero weights. It
is supposed that K is substantially smaller than n, K ≪ n.
    Classical methods usually require convexity and relatively well-behaved objective
functions and they are usually based on gradients for descent direction. Therefore,
standard optimization techniques can be no longer used if we add constraints on the
number of assets in portfolio. In such problems, classical optimization methods do not
work eciently and many researchers have to resort to heuristic optimization [18].
    For solution to the problem (12), in this paper we propose to use greedy algorithms.
The choice of the greedy algorithm for our analysis is based on the fact that greedy
algorithms showed an excellent performance in the papers [19], [16] for practical prob-
lem solution, and therefore we may assume that they look promising for solving the
cardinality constrained index tracking problem. On the other hand, greedy algorithms
do not necessarily yield an optimal solution.
    The constraint xT 1n = 1 can be rewritten in the conic form A(x) + b ∈ K with
       (                )                  (     )
          1 1 ... 1                           −1
  A=                      ∈ Rn × R2 , b =          , K = R2+ := {x ∈ R2 : x ≥ 0}. (13)
         −1 −1 . . . −1                        1
                           Dual Greedy Algorithm for Conic Optimization Problem          281

    The following proposition follows from Theorem 1 and nds the rate of convergence
of the dual greedy algorithm for the index tracking problem.

Corollary 1. Let E(x) := k1 ∥I − Rx∥22 and V := {x ∈ Rn : xT 1n = 1}. Let the
dictionary D be such that D = {±ej }nj=1 , where ej ∈ Rn with eji = 1 if i = j and
eji = 0 otherwise. Then

                          E(Gm ) −       inf      E(x) = O(m−1 ),
                                     x∈Σm (D)∩V

where Gm is the element obtained in the step m of the dual greedy algorithm with A,
b, K dened in (13).

Proof. We have K ∗ = K = R2+ . It easy to verify that ρ(E, u) ≤ γu2 with γ =
 1
2k    sup ∥Ry∥:
     ∥y∥=1


     ρ(E, u) =
              1           (                                                  )
          =        sup      ∥I − Rx − uRy∥22 + ∥I − Rx + uRy∥22 − 2∥I − Rx∥22 ≤
             2k x∈Ω,∥y∥=1
                                         1                     u2
                                    ≤         sup (2u2 ∥Ry∥) =     sup ∥Ry∥ = γu2 .
                                        2k x∈Ω,∥y∥=1           2k ∥y∥=1

                                                                                           ⊔
                                                                                           ⊓

    Primal greedy algorithms for the index tracking problems were empirically examined
in the papers [16] (in l2 -norm) and [20] (in l1 -norm). The analysis of heuristic algorithms
for portfolio optimization problems with the cardinality constraint can be found in the
work [18].

3.2     Unconstrained l1 -penalized regression
Let us consider the problem of recovering an unknown vector x0 ∈ Rn from the data
y ∈ Rk and the model
                                    y = Bx0 + z,                             (14)
where B is a known k × n matrix, z is a noise. In many practical problems there are
fewer observations/measurements than unknowns, i.e. k ≪ n. Recent works have shown
that accurate estimation is often possible under reasonable sparsity constraints on
x0 . One practically and theoretically eective estimator is unconstrained l1 -penalized
regression.
     Unconstrained l1 -penalized regression can be written as follows:

                               ∥y − Bx∥22 + λ∥x∥1 → min .                               (15)

where λ is a positive real parameter.
   Homotopy method for solving the problem (15) was proposed in papers [21], [22].
The method is also known as Least Angle Regression or LARS [23].
282       S. P. Sidorov, S. V. Mironov, M. G. Pleshakov

      Let us to rewrite the problem (15) in the following way:

                             ∥y − Bx∥2 → min, s.t. ∥x∥1 ≤ ϵ,                           (16)

where ϵ is a positive scalar. The real ϵ should be adjusted so that the true x0 is feasible,
at least with high probability, when the noise term z is stochastic.
    The equivalent conic formulation of the problem (16) is

               E(x) → ∥y − Bx∥2 , A(x) → (Bx, 0), b → (−y, ϵ), K → Lk1 ,               (17)

where Lk1 := {(y, t) ∈ Rk+1 : ∥y∥1 ≤ t}.
   The following corollary of Theorem 1 shows that the rate of convergence of the dual
greedy algorithm for the unconstrained l1 -penalized regression (16) is m−1 .

Corollary 2. Let E(x) := ∥y − Bx∥2 and V := {x ∈ Rn : ∥x∥1 ≤ ϵ}. Let the
dictionary D consists of all columns of matrix B multiplied by ±1. Then
                           E(Gm ) −      inf       E(x) = O(m−1 ),
                                      x∈Σm (D)∩V

where Gm is the element obtained in the step m of the dual greedy algorithm with A,
b, K dened in (17).

Proof. We have ρ(E, u) ≤ γu2 with γ = 2k
                                       1
                                         sup ∥By∥.                                        ⊔
                                                                                          ⊓
                                               ∥y∥=1


4      Conclusion
In this paper we have used greedy-type algorithms to solve (approximately) the problem

                                E(x) →             inf        ,
                                         x∈Σm (D), A(x)+b∈K

where E is a convex function dened on a Banach space X , A : X → Y is a linear
operator, b ∈ Y and K is a closed cone in Y . Since the solution to the problem (4)
can be inecient from the computational point of view, we propose the dual greedy
algorithm for the conic optimization problem. Theorem 1 nds estimates on the rate of
convergence for the dual greedy algorithm. We should mention that the approach used
in this paper is based on the ideas and results developed in the paper [9]. Based on
Theorem 1 we proved that the rate of convergence of dual greedy algorithms for index
tracking problem (12) and the unconstrained l1 -penalized regression (16) is m−1 .


References
1. Friedman J. Greedy Function Approximation: A Gradient Boosting Machine The Annals
   of Statistics 2001, Vol. 29, No. 5, 1189-1232
2. Davis G., Mallat S., Avellaneda M. Adaptive greedy approximation Constr. Approx. 13
   (1997), pp. 57-98
                           Dual Greedy Algorithm for Conic Optimization Problem          283

3. Zhang Z., Shwartz S., Wagner L., Miller W. A greedy algorithm for aligning DNA sequences
   J. Comput. Biol. 2000 Feb-Apr; (1-2): 203-14
4. Huber P. J. Projection pursuit Ann. Statist., 13(1985), pp. 435-525
5. Jones L. On a conjecture of Huber concerning the convergence of projection pursuit re-
   gression Ann. Statist. 15 (1987), 880-882
6. Barron A. R., Cohen A., Dahmen W., DeVore R. A. Approximation and Learning by
   Greedy Algorithms The Annals of Statistics 2008, Vol. 36, No. 1, 64-94
7. DeVore R. A., Temlyakov V.N. Some remarks on greedy algorithms Advances in Compu-
   tational Mathematics 5(1996) 173-187
8. Konyagin S. V., Temlyakov V. N. A remark on greedy approximation in Banach spaces
   East J. Approx. 5 (1999), no. 3, 365-379
9. V. N. Temlyakov, Greedy approximation in convex optimization, Constructive Approxima-
   tion 41 (2) (2015) 269296.
10. H. Nguyen, G. Petrova, Greedy strategies for convex optimization, Calcolo 41 (2) (2016)
   118.
11. V. N. Temlyakov, Dictionary descent in optimization, Analysis Mathematica 42 (1) (2016)
   6989.
12. S. R. Becker, E. J. Candes and M. C. Grant, Templates for convex cone problems with ap-
   plications to sparse signal recovery, Mathematical Programming Computation, 3 (3) (2011)
   165218.
13. E. Candes and T. Tao, The Dantzig selector: Statistical estimation when p is much larger
   than n. Annals of Statistics 35 (2007) 23132351.
14. Tibshirani R. Regression shrinkage and selection via the LASSO, Journal of the Royal
   Statistical Society (Series B) 58 (1996) 267288.
15. Roll R. A Mean/Variance Analysis of Tracking Error. The Journal of Portfolio Manage-
   ment, 18(4):1322, 1992.
16. Takeda A., Niranjan M., ya Gotoh J., Kawahara Y. Simultaneous pursuit of out-of-sample
   performance and sparsity in index tracking portfolios. Computational Management Science,
   10(1):2149, 2013.
17. Brodie J., Daubechies I., De Mol Ch., Giannone D., Loris I. Sparse and stable Markowitz
   portfolios. Proc. of the National Academy of Sciences of the USA, 106(30):1226712272,
   2009.
18. Beasley J. E., Meade N., Chang T.-J. An evolutionary heuristic for the index tracking
   problem. European Journal of Operational Research, 148(3):621643, 2003.
19. Das A., Kempe D. Submodular meets spectral: Greedy algorithms for subset selection,
   sparse approximation and dictionary selection. In Lise Getoor and Tobias Scheer, editors,
   Proceedings of the 28th Int. Conf. on Machine Learning (ICML-11), pages 10571064, New
   York, NY, USA, June 2011. ACM.
20. Sidorov S. P., Faizliev A. R., Khomchenko A. A. Algorithms for l1 -Norm Minimization
   of Index Tracking Error and Their Performance. Int. J. of Mathematics in Operational
   Research, 2016, to appear
21. Osborne M. R., Presnell B., and Turlach B. A. A New Approach to Variable Selection in
   Least Squares Problems. IMA Journal of Numerical Analysis, 20(3):389403, 2000.
22. Osborne M. R., Presnell B., and Turlach B. A. On the LASSO and Its Dual. Journal of
   Computational and Graphical Statistics, 9(2):319337, 2000.
23. Efron B., HastieT., Johnstone I., Tibshirani R. Least Angle Regression. Annals of Statis-
   tics, 32(2):407499, 2004.

</pre>