=Paper=
{{Paper
|id=Vol-1623/papermp13
|storemode=property
|title=Dual Greedy Algorithm for Conic Optimization Problem
|pdfUrl=https://ceur-ws.org/Vol-1623/papermp13.pdf
|volume=Vol-1623
|authors=Sergei Sidorov, Sergei Mironov, Michael Pleshakov
|dblpUrl=https://dblp.org/rec/conf/door/SidorovMP16
}}
==Dual Greedy Algorithm for Conic Optimization Problem==
Dual Greedy Algorithm for Conic Optimization Problem S. P. Sidorov, S. V. Mironov, M. G. Pleshakov⋆ Saratov State University, Russian Federation http://www.sgu.ru Abstract. In the paper we propose an algorithm for nding approximate sparse solutions of convex optimization problem with conic constraints and examine convergence properties of the algorithm with application to the index tracking problem and unconstrained l1 -penalized regression. Keywords: greedy algorithm; constrained convex optimization; conic optimiza- tion problem; index tracking problem. 1 Introduction Greedy algorithms have been intensively studied since the 80s of the last century, and their main credit consists in obtaining constructive methods for nding the best m- term approximations. The main contribution to the development of greedy algorithms was made by J. Friedman [1], S. Mallat [2], J. Zhang [3], P. Huber [4], L. Jones Jones, A. Barron [6], R. DeVore, V.N. Temlyakov [7], S.V. Konyagin [8] and others. Let X be a Banach space with norm ∥ · ∥X . A set of elements D from the space X is called a dictionary if each element g ∈ D has norm bounded by one, ∥g∥ ≤ 1; the closure of span D is X , i.e. spanD = X . A dictionary D is called symmetric if −g ∈ D for every g ∈ D. In this paper we assume that the dictionary D is symmetric. Let E be a convex function dened on X . The problem of convex optimization is to nd an approximate solution to the problem E(x) → min . (1) x∈X In many applications it is necessary to nd solutions of the problem (1), that are sparse with respect to D, i.e. we are interested in solving the following problem: E(x) → inf , (2) x∈Σm (D) ⋆ This work was supported in the Russian Fund for Basic Research under Grant 14-01-00140. Copyright ⃝ c by the paper's authors. Copying permitted for private and academic purposes. In: A. Kononov et al. (eds.): DOOR 2016, Vladivostok, Russia, published at http://ceur-ws.org Dual Greedy Algorithm for Conic Optimization Problem 277 where Σm (D) is the set of all m-term polynomials with respect to D: { } ∑m Σm (D) = x ∈ X : x = ci gi , gi ∈ D . (3) i=1 The paper of V.N.Temlyakov [9] examines greedy algorithms for nding an approx- imate solution to the problem (2). It is shown that greedy algorithms with respect to the dictionary D solve the problem (1) as well. The paper [9] proposes the weak greedy Chebyshev algorithm and proves some estimates of the convergence rate of the algorithm based on the geometric properties of the function E . The development of ideas presented in [9] can be found in [10], [11]. Let Y be the m-dimensional Euclidean space, Y = Rm . Let E be a convex function dened on X . Many convex constrained optimization problems can be expressed in the conic form: E(x) → min , (4) A(x)+b∈K where A : X → Y is a linear operator, b ∈ Y and K is a closed cone in Y . Recall that K is said to be a cone in Y if a1 x1 + a2 x2 ∈ K for any x1 , x2 ∈ K and a1 , a2 ∈ [0, ∞). At rst sight the constraint A(x) + b ∈ K seems to be too specialized, but as it is pointed out in [12], any convex subset of Y = Rm may be represented in the conic form. In particular, as it was shown in [12] both Dantzig selector problem [13] and LASSO regression problem [14] can be reformulated in the form (4). We are interested in nding an approximate solution to the problem (4). Further- more, in many applications it is necessary to nd solutions of (4) that are sparse with respect to the dictionary D: E(x) → inf . (5) x∈Σm (D), A(x)+b∈K In the paper we propose the dual weak greedy Chebyshev algorithm for nding approximate sparse solutions of convex optimization problem with conic constraint and examine convergence properties of the algorithm. 2 Dual weak greedy Chebyshev algorithm It may turn out that the solution to the problem (4) can be inecient from the com- putational point of view since projection onto the set {x : A(x) + b ∈ K} may be expensive, as well as the nding of a single feasible point. For example, projection onto the set {x : ∥y − Ax∥2 ≤ ϵ}, x ∈ Rn , y ∈ Rm , is very computational expensive, while projection onto the dual cone is trivial. The dual problem of the problem (4) is F (λ) → max∗ , (6) λ∈K where F (λ) is the Lagrange dual function: F (λ) := inf L(x, λ) = inf (E(x) − ⟨λ, A(x) + b⟩), x x 278 S. P. Sidorov, S. V. Mironov, M. G. Pleshakov and K ∗ is the dual cone, K ∗ := {λ ∈ Y : ⟨λ, y⟩ ≥ 0 ∀y ∈ K}. Let A∗ : Rm → Rn denote the adjoint of the linear operator A. Let E ∗ be the convex conjugate of E , i.e. E ∗ (z) = sup(⟨z, x⟩ − E(x)). x Then the dual problem (6) can be rewritten in the following form: −E ∗ (A∗ (λ)) − ⟨b, λ⟩ → max∗ . λ∈K Recall that the following inequality holds for any optimal primal/dual pair x and λ: E(x) − F (λ) = E(x) + E ∗ (A∗ (λ)) + ⟨b, λ⟩ ≥ ⟨x, A∗ (λ)⟩ + ⟨b, λ⟩ = ⟨A(x) + b, λ⟩ ≥ 0. If the optimal solutions of the primal problem x and the dual problem λ are strictly feasible then E(x) = F (λ) and there exist (not necessary unique) points x∗ , λ∗ , such that E(x∗ ) = F (λ∗ ) = L(x∗ , λ∗ ) and satisfying optimality conditions A(x∗ ) + b ∈ K, λ∗ ∈ K ∗ , ⟨A(x∗ ) + b, λ∗ ⟩ = 0, A∗ (λ∗ ) ∈ ∂E(x∗ ), (7) where ∂E is the subgradient of E . Let τ := {tm }∞ ∗ m=1 , tm ∈ [0, 1], be a weakness sequence. Let λ satisfy (7). We suppose that function E is Frechet dierentiable. We note that it follows from convexity of E that for any x, y E(y) ≥ E(x) + ⟨E ′ (x), y − x⟩, where E ′ (x) denotes Frechet dierential of E at x. Denote Q(x) := E(x) − ⟨A(x) + b, λ∗ ⟩. Dual weak greedy Chebyshev algorithm. Let G0 := 0. For each m ≥ 1 we dene Gm by induction as follows. 1. (Gradient greedy step) Find the element ϕm ∈ D satisfying ⟨−Q′ (Gm−1 ), ϕm ⟩ ≥ tm sup ⟨−Q′ (Gm−1 ), g⟩. g∈D 2. (Chebyshev-type step) Find real numbers c∗i , i = 1, . . . , m, such that (m ) (m ) ∑ ∑ Q c∗i ϕi = inf Q c i ϕi . ci i=1 i=1 ∑m 3. Let Gm = ∗ i=1 ci ϕi . Dual Greedy Algorithm for Conic Optimization Problem 279 The gradient greedy step maximizes a certain functional determined by gradient information from the previous steps of the algorithm. The Chebyshev-type step nds the best linear combination of m approximants {ϕi }m i=1 . Let Ω := {x ∈ X : E(x) ≤ E(0)} and suppose that Ω is bounded. The modulus of smoothness of function E on the bounded set Ω is dened as follows: 1 ρ(E, u) = sup |E(x + uy) + E(x − uy) − 2E(x)|. (8) 2 x∈Ω,∥y∥=1 Let A1 (D) denote the closure (in X) of the convex hull of D. Using results and ideas of [9] we can prove the following proposition. Theorem 1. Let E be a uniformly smooth convex function with modulus of smoothness ρ(E, u) ≤ γuq , 1 < q ≤ 2. Let V := {x ∈ X : A(x) + b ∈ K}. Let ϵ > 0 and element f ϵ ∈ D be such that Q(f ϵ ) ≤ inf Q(x) + ϵ, f ϵ /C(ϵ) ∈ A1 (D), x∈D for some real number C(ϵ) ≥ 1. Let p = q/(q − 1). Then E(Gm ) − inf E(x) ≤ x∈Σm (D)∩V ( )1−q ∑ m ≤ max 2ϵ, C(q, γ)A(ϵ) C(E, A, b, K, q, γ) + tpk , (9) k=1 where C(q, γ), C(E, A, b, K, q, γ) are positive constants not depending on k. Proof. We have ρ(Q, u) = ρ(E, u). It follows from Theorem 4.2 of [9] that ( )1−q ∑ m Q(Gm ) − inf Q(x) ≤ max 2ϵ, C(q, γ)A(ϵ) C(Q, q, γ) + tpk . (10) x∈D k=1 Then (9) follows from (10) and Q(Gm ) − inf Q(x) = E(Gm ) − ⟨A(Gm ) + b, λ∗ ⟩ − inf (E(x) − ⟨A(x) + b, λ∗ ⟩) ≥ x∈Ω x∈Ω ≥ E(Gm ) − ⟨A(Gm ) + b, λ ⟩ − inf (E(x) − ⟨A(Gm ) + b, λ∗ ⟩) = ∗ x∈Ω = E(Gm ) − inf E(x) ≥ E(Gm ) − inf E(x). x∈Ω x∈Σm (D)∩V ⊔ ⊓ 3 Applications 3.1 Index tracking problem Greedy algorithms showed an excellent performance in solution of practical problems of machine learning and optimization. This section will show the use of such techniques in 280 S. P. Sidorov, S. V. Mironov, M. G. Pleshakov solving the index tracking problem. Index tracking is a passive nancial strategy that tries to replicate the performance of a given index or benchmark. The aim of investor is to nd the weights of assets in her/his portfolio that minimize the tracking error, i.e. dierence between the performance of the index and the portfolio. ∑n 1/q For any q > 0 and x = (x1 , . . . , xn )T ∈ Rn , let ∥x∥q := ( i=1 |xi |q ) and ∥x∥0 = limq→0+ ∥x∥q = (the number of non-zero elements of x). If q ≥ 1 then ∥x∥q denotes lq -norm of x ∈ Rn . Let n be the number of investable assets. Denote rti the return of asset i at time t, 1 ≤ i ≤ n, 1 ≤ t ≤ k , R = (rti ) is the k × n matrix. A portfolio is dened to be a vector of weights, x = (x1 , . . . , xn )T ∈ Rn . We do not allow the portfolio changes over time and do not take into account transaction costs. We will assume that 1. one unit of capital is available, i.e. xT 1n = 1, where 1n denotes the vector from Rn in which every component is equal to 1; 2. short selling is allowed, i.e. weights xi can be negative. Let It be the index return at time t, 1 ≤ t ≤ k , and I = (I1 , . . . , Ik )T ∈ Rk . In the traditional index tracking optimization, the objective is to nd a portfolio which has minimal tracking error variance, the sum of squared deviations between portfolio returns and market index returns (see e.g. [15]): 1 x∗ = arg min ∥I − Rx∥22 s.t. xT 1n = 1. (11) k It should be noted that the standard Markovitz model is a special case of index tracking portfolio model (11) (see, for example [16], [17]). Since the problem (11) is the problem of convex optimization, it can be easily solved by Lagrange method. In this section we examine algorithm for solving the problem (11) with the cardi- nality constraint: 1 x∗ = arg min ∥I − Rx∥22 s.t. xT 1n = 1, ∥x∥0 ≤ K, (12) k where K is the limit on the number of assets in the portfolio with non-zero weights. It is supposed that K is substantially smaller than n, K ≪ n. Classical methods usually require convexity and relatively well-behaved objective functions and they are usually based on gradients for descent direction. Therefore, standard optimization techniques can be no longer used if we add constraints on the number of assets in portfolio. In such problems, classical optimization methods do not work eciently and many researchers have to resort to heuristic optimization [18]. For solution to the problem (12), in this paper we propose to use greedy algorithms. The choice of the greedy algorithm for our analysis is based on the fact that greedy algorithms showed an excellent performance in the papers [19], [16] for practical prob- lem solution, and therefore we may assume that they look promising for solving the cardinality constrained index tracking problem. On the other hand, greedy algorithms do not necessarily yield an optimal solution. The constraint xT 1n = 1 can be rewritten in the conic form A(x) + b ∈ K with ( ) ( ) 1 1 ... 1 −1 A= ∈ Rn × R2 , b = , K = R2+ := {x ∈ R2 : x ≥ 0}. (13) −1 −1 . . . −1 1 Dual Greedy Algorithm for Conic Optimization Problem 281 The following proposition follows from Theorem 1 and nds the rate of convergence of the dual greedy algorithm for the index tracking problem. Corollary 1. Let E(x) := k1 ∥I − Rx∥22 and V := {x ∈ Rn : xT 1n = 1}. Let the dictionary D be such that D = {±ej }nj=1 , where ej ∈ Rn with eji = 1 if i = j and eji = 0 otherwise. Then E(Gm ) − inf E(x) = O(m−1 ), x∈Σm (D)∩V where Gm is the element obtained in the step m of the dual greedy algorithm with A, b, K dened in (13). Proof. We have K ∗ = K = R2+ . It easy to verify that ρ(E, u) ≤ γu2 with γ = 1 2k sup ∥Ry∥: ∥y∥=1 ρ(E, u) = 1 ( ) = sup ∥I − Rx − uRy∥22 + ∥I − Rx + uRy∥22 − 2∥I − Rx∥22 ≤ 2k x∈Ω,∥y∥=1 1 u2 ≤ sup (2u2 ∥Ry∥) = sup ∥Ry∥ = γu2 . 2k x∈Ω,∥y∥=1 2k ∥y∥=1 ⊔ ⊓ Primal greedy algorithms for the index tracking problems were empirically examined in the papers [16] (in l2 -norm) and [20] (in l1 -norm). The analysis of heuristic algorithms for portfolio optimization problems with the cardinality constraint can be found in the work [18]. 3.2 Unconstrained l1 -penalized regression Let us consider the problem of recovering an unknown vector x0 ∈ Rn from the data y ∈ Rk and the model y = Bx0 + z, (14) where B is a known k × n matrix, z is a noise. In many practical problems there are fewer observations/measurements than unknowns, i.e. k ≪ n. Recent works have shown that accurate estimation is often possible under reasonable sparsity constraints on x0 . One practically and theoretically eective estimator is unconstrained l1 -penalized regression. Unconstrained l1 -penalized regression can be written as follows: ∥y − Bx∥22 + λ∥x∥1 → min . (15) where λ is a positive real parameter. Homotopy method for solving the problem (15) was proposed in papers [21], [22]. The method is also known as Least Angle Regression or LARS [23]. 282 S. P. Sidorov, S. V. Mironov, M. G. Pleshakov Let us to rewrite the problem (15) in the following way: ∥y − Bx∥2 → min, s.t. ∥x∥1 ≤ ϵ, (16) where ϵ is a positive scalar. The real ϵ should be adjusted so that the true x0 is feasible, at least with high probability, when the noise term z is stochastic. The equivalent conic formulation of the problem (16) is E(x) → ∥y − Bx∥2 , A(x) → (Bx, 0), b → (−y, ϵ), K → Lk1 , (17) where Lk1 := {(y, t) ∈ Rk+1 : ∥y∥1 ≤ t}. The following corollary of Theorem 1 shows that the rate of convergence of the dual greedy algorithm for the unconstrained l1 -penalized regression (16) is m−1 . Corollary 2. Let E(x) := ∥y − Bx∥2 and V := {x ∈ Rn : ∥x∥1 ≤ ϵ}. Let the dictionary D consists of all columns of matrix B multiplied by ±1. Then E(Gm ) − inf E(x) = O(m−1 ), x∈Σm (D)∩V where Gm is the element obtained in the step m of the dual greedy algorithm with A, b, K dened in (17). Proof. We have ρ(E, u) ≤ γu2 with γ = 2k 1 sup ∥By∥. ⊔ ⊓ ∥y∥=1 4 Conclusion In this paper we have used greedy-type algorithms to solve (approximately) the problem E(x) → inf , x∈Σm (D), A(x)+b∈K where E is a convex function dened on a Banach space X , A : X → Y is a linear operator, b ∈ Y and K is a closed cone in Y . Since the solution to the problem (4) can be inecient from the computational point of view, we propose the dual greedy algorithm for the conic optimization problem. Theorem 1 nds estimates on the rate of convergence for the dual greedy algorithm. We should mention that the approach used in this paper is based on the ideas and results developed in the paper [9]. Based on Theorem 1 we proved that the rate of convergence of dual greedy algorithms for index tracking problem (12) and the unconstrained l1 -penalized regression (16) is m−1 . References 1. Friedman J. Greedy Function Approximation: A Gradient Boosting Machine The Annals of Statistics 2001, Vol. 29, No. 5, 1189-1232 2. Davis G., Mallat S., Avellaneda M. Adaptive greedy approximation Constr. Approx. 13 (1997), pp. 57-98 Dual Greedy Algorithm for Conic Optimization Problem 283 3. Zhang Z., Shwartz S., Wagner L., Miller W. A greedy algorithm for aligning DNA sequences J. Comput. Biol. 2000 Feb-Apr; (1-2): 203-14 4. Huber P. J. Projection pursuit Ann. Statist., 13(1985), pp. 435-525 5. Jones L. On a conjecture of Huber concerning the convergence of projection pursuit re- gression Ann. Statist. 15 (1987), 880-882 6. Barron A. R., Cohen A., Dahmen W., DeVore R. A. Approximation and Learning by Greedy Algorithms The Annals of Statistics 2008, Vol. 36, No. 1, 64-94 7. DeVore R. A., Temlyakov V.N. Some remarks on greedy algorithms Advances in Compu- tational Mathematics 5(1996) 173-187 8. Konyagin S. V., Temlyakov V. N. A remark on greedy approximation in Banach spaces East J. Approx. 5 (1999), no. 3, 365-379 9. V. N. Temlyakov, Greedy approximation in convex optimization, Constructive Approxima- tion 41 (2) (2015) 269296. 10. H. Nguyen, G. Petrova, Greedy strategies for convex optimization, Calcolo 41 (2) (2016) 118. 11. V. N. Temlyakov, Dictionary descent in optimization, Analysis Mathematica 42 (1) (2016) 6989. 12. S. R. Becker, E. J. Candes and M. C. Grant, Templates for convex cone problems with ap- plications to sparse signal recovery, Mathematical Programming Computation, 3 (3) (2011) 165218. 13. E. Candes and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (2007) 23132351. 14. Tibshirani R. Regression shrinkage and selection via the LASSO, Journal of the Royal Statistical Society (Series B) 58 (1996) 267288. 15. Roll R. A Mean/Variance Analysis of Tracking Error. The Journal of Portfolio Manage- ment, 18(4):1322, 1992. 16. Takeda A., Niranjan M., ya Gotoh J., Kawahara Y. Simultaneous pursuit of out-of-sample performance and sparsity in index tracking portfolios. Computational Management Science, 10(1):2149, 2013. 17. Brodie J., Daubechies I., De Mol Ch., Giannone D., Loris I. Sparse and stable Markowitz portfolios. Proc. of the National Academy of Sciences of the USA, 106(30):1226712272, 2009. 18. Beasley J. E., Meade N., Chang T.-J. An evolutionary heuristic for the index tracking problem. European Journal of Operational Research, 148(3):621643, 2003. 19. Das A., Kempe D. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In Lise Getoor and Tobias Scheer, editors, Proceedings of the 28th Int. Conf. on Machine Learning (ICML-11), pages 10571064, New York, NY, USA, June 2011. ACM. 20. Sidorov S. P., Faizliev A. R., Khomchenko A. A. Algorithms for l1 -Norm Minimization of Index Tracking Error and Their Performance. Int. J. of Mathematics in Operational Research, 2016, to appear 21. Osborne M. R., Presnell B., and Turlach B. A. A New Approach to Variable Selection in Least Squares Problems. IMA Journal of Numerical Analysis, 20(3):389403, 2000. 22. Osborne M. R., Presnell B., and Turlach B. A. On the LASSO and Its Dual. Journal of Computational and Graphical Statistics, 9(2):319337, 2000. 23. Efron B., HastieT., Johnstone I., Tibshirani R. Least Angle Regression. Annals of Statis- tics, 32(2):407499, 2004.