-

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Fedor S. Stonyakin

Alexander A. Titov

a.a.titov@phystech.edu 0 0 Moscow Institute of Physics and Technologies , Moscow , Russia 1 V. I. Vernadsky Crimean Federal University , Simferopol , Russia

372 384

The paper is devoted to a special Mirror Descent algorithm for problems of convex minimization with functional constraints. The objective function may not satisfy the Lipschitz condition, but it must necessarily have a Lipshitz-continuous gradient. We assume, that the functional constraint can be non-smooth, but satisfying the Lipschitz condition. In particular, such functionals appear in the well-known Truss Topology Design problem. Also, we have applied the technique of restarts in the mentioned version of Mirror Descent for strongly convex problems. Some estimations for the rate of convergence are investigated for the Mirror Descent algorithms under consideration.

Adaptive Mirror Descent algorithm gradient Technique of restarts Lipshitz-continuous

The optimization of non-smooth functionals with constraints attracts widespread interest in large-scale optimization and its applications [ 6, 18 ]. There are various methods of solving this kind of optimization problems. Some examples of these methods are: bundle-level method [ 14 ], penalty method [ 19 ], Lagrange multipliers method [ 7 ]. Among them, Mirror Descent (MD) [ 4, 12 ] is viewed as a simple method for non-smooth convex optimization.

In optimization problems with quadratic functionals we consider functionals which do not satisfy the usual Lipschitz property (or the Lipschitz constant is quite large), but they have a Lipshitz-continuous gradient. For such problems in ([ 2 ], item 3.3) the ideas of [ 14, 15 ] were adopted to construct some adaptive Copyright c by the paper's authors. Copying permitted for private and academic purposes. In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org version of Mirror Descent algorithm. For example, let Ai (i = 1; : : : ; m) be a positive-de nite matrix: xT Aix 0 8x and the objective function f (x) = 1miaxm fi(x) for fi(x) = 1 2 hAix; xi

hbi; xi + i; i = 1; : : : ; m: Note that such functionals appear in the Truss Topology Design problem with weights of the bars [ 15 ].

In this paper we propose some partial adaptive (by objective functional) version of algorithm from ([ 2 ], item 3.3). It simpli es work with problems where the necessity of calculating the norm of the subgradient of the functional constraint is burdensome in view of the large number of constraints. The idea of restarts [ 9 ] is adopted to construct the proposed algorithm in the case of strongly convex objective and constraints. It is well-known that both considered methods are optimal in terms of the lower bounds [ 12 ].

Note that a functional constraint, generally, can be non-smooth. That is why we consider subgradient methods. These methods have a long history starting with the method for deterministic unconstrained problems and Euclidean setting in [ 17 ] and the generalization for constrained problems in [ 16 ], where the idea of steps switching between the direction of subgradient of the objective and the direction of subgradient of the constraint was suggested. Non-Euclidean extension, usually referred to as Mirror Descent, originated in [ 10, 12 ] and was later analyzed in [ 4 ]. An extension for constrained problems was proposed in [ 12 ], see also recent version in [ 3 ]. To prove faster convergence rate of Mirror Descent for strongly convex objective in an unconstrained case, the restart technique [11{ 13] was used in [ 8 ]. Usually, the stepsize and stopping rule for Mirror Descent requires to know the Lipschitz constant of the objective function and constraint, if any. Adaptive stepsizes, which do not require this information, are considered in [ 5 ] for problems without inequality constraints, and in [ 3 ] for constrained problems.

We consider some Mirror Descent algorithms for constrained problems in the case of non-standard growth properties of objective functional (it has a Lipshitzcontinuous gradient).

The paper consists of Introduction and three main sections. In Section 2 we give some basic notation concerning convex optimization problems with functional constrains. In Section 3 we describe some partial adaptive version (Algorithm 2) of Mirror Descent algorithm from ([ 2 ], item 3.3) and prove some estimates for the rate of convergence of Algorithm 2. The last Section 4 is focused on the strongly convex case with restarting Algorithm 2 and corresponding theoretical estimates for the rate of convergence.

Problem Statement and Standard Mirror Descent Basics Let (E; jj jj) be a normed nite-dimensional vector space and E be the conjugate space of E with the norm: jjyjj = mxaxfhy; xi; jjxjj 1g; where hy; xi is the value of the continuous linear functional y at x 2 E.

Let X E be a (simple) closed convex set. We consider two convex subdi rentiable functionals f and g : X ! R. Also, we assume that g is Lipschitzcontinuous: jg(x) g(y)j

Mgjjx

f (x) ! xm2iXn; s:t: g(x) 0: (1) (2) (3)

Let d : X ! R be a distance generating function (d.g.f) which is continuously di erentiable and 1-strongly convex w.r.t. the norm k k, i.e.

8x; y; 2 X hrd(x) rd(y); x yi kx yk2; and assume that min d(x) = d(0): Suppose we have a constant 0 such that x2X d(x ) 02; where x is a solution of (2) { (3).

Note that if there is a set of optimal points X , then we may assume that min d(x ) x 2X 02: For all x; y 2 X consider the corresponding Bregman divergence V (x; y) = d(y) d(x) hrd(x); y xi: Standard proximal setups, i.e. Euclidean, entropy, `1=`2, simplex, nuclear norm, spectahedron can be found, e.g. in [ 5 ]. Let us de ne the proximal mapping operator standardly

Mirrx(p) = arg um2iXn hp; ui + V (x; u) for each x 2 X and p 2 E : We make the simplicity assumption, which means that Mirrx(p) is easily computable.

Some Mirror Descent Algorithm for the Type of Problems Under Consideration Following [ 14 ], given a function f for each subgradient rf (x) at a point y 2 X, we de ne vf (x; y) = 8 > < >:0

rf (x) krf (x)k ; x y ; rf (x) 6= 0 rf (x) = 0 ; x 2 X: (4)

In ([ 2 ], item 3.3) the following adaptive Mirror Descent algorithm for Problem (2) { (3) was proposed by the rst author.

Algorithm 1 Adaptive Mirror Descent, Non-Standard Growth

Ensure: xN := arg minxk;k2I f (xk)

" ! then hN " xN+1 jjrf(xN )jj

M irrxN (hN rf (xN )) ("productive steps") N ! I else (g(xN ) > ") ! hN " xN+1 jjrg(xN )jj2

M irrxN (hN rg(xN )) ("non-productive steps")

For the previous method the next result was obtained in [2].

Theorem 1. Let " > 0 be a xed positive number and Algorithm 1 works steps. Then

Let us remind one well-known statement (see, e.g. [5]).

N = & 2 maxf1; Mg2g 02 '

"2 min vf (xk; x ) < ": k2I (5) (6) Lemma 1. Let f : X ! R be a convex subdi erentiable function over the convex set X and a sequence fxkg be de ned by the following relation:

xk+1 = M irrxk (hkrf (xk)): Then for each x 2 X

M irrxN (hN rg(xN )) ("non-productive steps") and jIj is the number of "productive steps". Similarly, for "non-productive" steps from the set J the analogous variable is de ned as follows: (8) (9) (10) 1g; J = [N ]=I, where I is a collection of indexes of hk =

Mgjjrf (xk)jj

; hk =

" Mg2

; jIj + jJ j = N: and jJ j is the number of "non-productive steps". Obviously,

Let us formulate the following analogue of Theorem 1.

N =

2) Similarly for the "non-productive" steps k 2 J : Using (1) and jjrg(x)jj Mg we have

hk(g(xk) g(x)) h2k 2 jjrg(xk)jj2 + V (xk; x) Theorem 2. Let " > 0 be a xed positive number and Algorithm 2 works 3) From (13) and (14) for x = x we have: "

X vf (xk; x ) + X " Mg k2I k2J Mg2 (g(xk) g(x ))

N 2M"2g2 + NX1(V (xk; x )

k=0

Let us note that for any k 2 J

V (xk+1; x )): and in view of

N X(V (xk; x ) V (xk+1; x )) 02 k=1 the inequality (15) can be transformed in the following way: "

X vf (xk; x ) Mg k2I

N 2M"2g2 +

On the other hand, Assume that

X vf (xk; x ) k2I

jIj mk2inI vf (xk; x ): "2 2Mg2

N 02; or N jIj M"g min vf (xk; x ) < M"2g2 jIj ) min vf (xk; x ) < " Mg :

To nish the proof we should demonstrate that jIj 6= 0. Supposing the reverse we claim that jIj = 0 ) jJ j = N , i.e. all the steps are non-productive, so after using (17) (18) So, we have the contradiction. It means that jIj 6= 0.

The following auxiliary assertion (see, e.g [ 14, 15 ]) is full lled (x is a solution of (2) { (3)).

Lemma 2. Let us introduce the following function: where is a positive number. Then for any y 2 X !( ) = maxff (x) f (x ) : jjx x2X x jj

g; f (y) f (x )

!(vf (y; x )): steps of Algorithm 2 working the next estimate can be ful lled:

Now we can show, how using the previous assertion and Theorem 2, one can estimate the rate of convergence of the Algorithm 2 if the objective function f is di erentiable and its gradient satis es the Lipschitz condition: jjrf (x) rf (y)jj

Ljjx

where where x 2 X and Then after "f + steps of Algorithm 2 working the next estimate can be ful lled: "f + where Remark 1. Generally, jjrf (x )jj 6= 0, because we consider some class of constrained problems. 4

On the Technique of Restarts in the Considered Version of Mirror Descent for Strongly Convex Problems In this subsection, we consider problem f (x) ! min; g(x) 0; x 2 X with assumption (1) and additional assumption of strong convexity of f and g with the same parameter , i.e., f (y) f (x) + hrf (x); y xi + 2 ky xk2; x; y 2 X and the same holds for g. We also slightly modify assumptions on prox-function d(x). Namely, we assume that 0 = arg minx2X d(x) and that d is bounded on the unit ball in the chosen norm k kE, that is d(x) 2 8x 2 X : kxk 1; where is some known number. Finally, we assume that we are given a starting point x0 2 X and a number R0 > 0 such that kx0 x k2 R02.

To construct a method for solving problem (21) under stated assumptions, we use the idea of restarting Algorithm 2. The idea of restarting a method for convex problems to obtain faster rate of convergence for strongly convex problems dates back to 1980's, see e.g. [ 12, 13 ]. To show that restarting algorithm is also possible for problems with inequality constraints, we rely on the following Lemma (see, e.g. [ 1 ]).

Lemma 3. If f and g are -strongly convex functionals with respect to the norm k k on X, x = arg min f (x), g(x) 0 (8x 2 X) and "f > 0 and "g > 0: x2X Then In conditions of Corollary 2, after Algorithm 2 stops the inequalities will be true (23) for "

Mg "f =

krf (x )k + and "g = ". Consider the function

: R+ ! R+: ( ) = max krf (x )k +

Mg :

It is clear that increases and therefore for each " > 0 there exists '(") > 0 : ('(")) = ":

Remark 2. '(") depends on krf (x )k and Lipschitz constant L for rf . If krf (x )k < Mg then '(") = " for small enough ": "2L 2Mg2 2L 2 ;

For another case (krf (x )k > Mg) we have 8" > 0:

" < 2(Mg krf (x )k ) : L '(") = pkrf (x )k2 + 2"L L krf (x )k :

Let us consider the following Algorithm 3 for the problem (21). Algorithm 3 Algorithm for the Strongly Convex Problem

Require: accuracy " > 0; strong convexity parameter ; 02 s.t. d(x) kxk 1; starting point x0 and number R0 s.t. kx0 x k2 R02. 1: Set d0(x) = d x x0 .

R0 2 0 8x 2 X : 2: Set p = 1. 3: repeat 4: Set Rp2 = R02 2 p.

R2p2 . 9: until p > log2 2R"02 .

Ensure: xp.

The following theorem is ful lled.

Theorem 3. Let f and g satisfy Corollary 2. If f; g are -strongly convex functionals on X Rn and d(x) 02 8 x 2 X; kxk 1. Let the starting point x0 2 X and the number R0 > 0 be given and kx0

R02 p = log2 2" b xp is the "-solution of Problem (2) { (3), where b At the same time, the total number of iterations of Algorithm 2 does not exceed kxpb For p = 0 this assertion is obvious by virtue of the choice of x0 and R0. Suppose that for some p: kxp x k2 Rp2. Let us prove that kxp+1 x k2 Rp2+1. We have dp(x ) 02 and on (p + 1)-th restart after no more than iterations of Algorithm 2 the following inequalities are true: f (xp+1) f (x ) "p+1; g(xp+1) "p+1 for "p+1 =

Rp2+1 : 2

Then, according to Lemma 3 So, for all p 0

kxp+1 x k2

2"p+1 = Rp2+1: kxp x k2

Rp2 =

; f (xp) f (x ) R02 2p

R02 2 p; g(xp) 2

R02 2 p: 2

R02 the following relation is true For p = pb = log2 2" kxp x k2

Rp2 = R02 2 p 2" :

It remains only to note that the number of iterations of the work of Algorithm

2 is no more than

'2("p+1)

p p + Xb 2 02Mg2 : b Acknowledgement. The authors are very grateful to Yurii Nesterov, Alexander Gasnikov and Pavel Dvurechensky for fruitful discussions. The authors would like to thank the unknown reviewers for useful comments and suggestions.

The research by Fedor Stonyakin and Alexander Titov (Algorithm 3, Theorem 3, Corollary 1 and Corollary 2) presented was partially supported by Russian Foundation for Basic Research according to the research project 18-31-00219. The research by Fedor Stonyakin (Algorithm 2 and Theorem 2) presented was partially supported by the grant of the President of the Russian Federation for young candidates of sciences, project no. MK-176.2017.1.

1. Bayandina , A. , Gasnikov , A. , Gasnikova , E. , Matsievsky , S. : Primal-dual mirror descent for the stochastic programming problems with functional constraints . Computational Mathematics and Mathematical Physics. (accepted) ( 2018 ). https: //arxiv.org/pdf/1604.08194.pdf, (in Russian)

2. Bayandina , A. , Dvurechensky , P. , Gasnikov , A. , Stonyakin , F. , Titov , A. : Mirror descent and convex optimization problems with non-smooth inequality constraints . In: LCCC Focus Period on Large-Scale and Distributed Optimization . Sweden, Lund: Springer. (accepted) ( 2017 ). https://arxiv.org/abs/1710.06612

3. Beck , A. , Ben-Tal , A. , Guttmann-Beck , N. , Tetruashvili , L. : The comirror algorithm for solving nonsmooth constrained convex problems . Operations Research Letters 38 ( 6 ), 493 { 498 ( 2010 )

4. Beck , A. , Teboulle , M. : Mirror descent and nonlinear projected subgradient methods for convex optimization . Oper. Res. Lett . 31 ( 3 ), 167 { 175 ( 2003 )

5. Ben-Tal , A. , Nemirovski , A. : Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics , Philadelphia ( 2001 )

6. Ben-Tal , A. , Nemirovski , A. : Robust Truss Topology Design via semide nite programming . SIAM Journal on Optimization 7 ( 4 ), 991 { 1016 ( 1997 )

7. Boyd , S. , Vandenberghe , L.: Convex Optimization . Cambridge University Press, New York ( 2004 )

8. Juditsky , A. , Nemirovski , A. : First order methods for non-smooth convex largescale optimization. I: general purpose methods . Optimization for Machine Learning, S. Sra et al (eds.) , 121 { 184 . Cambridge, MA: MIT Press ( 2012 )

9. Juditsky , A. , Nesterov , Y. : Deterministic and stochastic primal-gual subgradient algorithms for uniformly convex minimization . Stochastic Systems 4 ( 1 ), 44 { 80 ( 2014 )

10. Nemirovskii , A.: E cient methods for large-scale convex optimization problems . Ekonomika i Matematicheskie Metody ( 1979 ), (in Russian)

11. Nemirovskii , A. , Nesterov , Y. : Optimal methods of smooth convex minimization . USSR Computational Mathematics and Mathematical Physics 25 ( 2 ), 21 { 30 ( 1985 ), (in Russian)

12. Nemirovsky , A. , Yudin , D. : Problem Complexity and Method E ciency in Optimization . J. Wiley & Sons, New York ( 1983 )

13. Nesterov , Y. : A method of solving a convex programming problem with convergence rate O(1=k2) . Soviet Mathematics Doklady 27 ( 2 ), 372 { 376 ( 1983 )

14. Nesterov , Y. : Introductory Lectures on Convex Optimization:

A Basic

Course . Kluwer Academic Publishers, Massachusetts ( 2004 )

15. Nesterov , Y. : Subgradient methods for convex functions with nonstandard growth properties : https://www.mathnet.ru:8080/PresentFiles/16179/ growthbm_nesterov.pdf, [Online; accessed 01-April-2018]

16. Polyak , B. : A general method of solving extremum problems . Soviet Mathematics Doklady 8 ( 3 ), 593 { 597 ( 1967 ), (in Russian)

17. Shor , N. Z. : Generalized gradient descent with application to block programming . Kibernetika 3 ( 3 ), 53 { 55 ( 1967 ), (in Russian)

18. Shpirko , S. , Nesterov

: Primal-dual subgradient methods for huge-scale linear conic problem . SIAM Journal on Optimization 24 ( 3 ), 1444 { 1457 ( 2014 )

19. Vasilyev , F.: Optimization Methods. Fizmatlit , Moscow ( 2002 ), (in Russian)