One Mirror Descent Algorithm for Convex
       Constrained Optimization Problems with
           Non-Standard Growth Properties

                     Fedor S. Stonyakin1 and Alexander A. Titov2
            1
              V. I. Vernadsky Crimean Federal University, Simferopol, Russia
                                   fedyor@mail.ru
            2
              Moscow Institute of Physics and Technologies, Moscow, Russia
                               a.a.titov@phystech.edu


         Abstract. The paper is devoted to a special Mirror Descent algorithm
         for problems of convex minimization with functional constraints. The
         objective function may not satisfy the Lipschitz condition, but it must
         necessarily have a Lipshitz-continuous gradient. We assume, that the
         functional constraint can be non-smooth, but satisfying the Lipschitz
         condition. In particular, such functionals appear in the well-known Truss
         Topology Design problem. Also, we have applied the technique of restarts
         in the mentioned version of Mirror Descent for strongly convex problems.
         Some estimations for the rate of convergence are investigated for the
         Mirror Descent algorithms under consideration.

         Keywords: Adaptive Mirror Descent algorithm · Lipshitz-continuous
         gradient · Technique of restarts


1     Introduction

The optimization of non-smooth functionals with constraints attracts widespread
interest in large-scale optimization and its applications [6, 18]. There are various
methods of solving this kind of optimization problems. Some examples of these
methods are: bundle-level method [14], penalty method [19], Lagrange multipli-
ers method [7]. Among them, Mirror Descent (MD) [4, 12] is viewed as a simple
method for non-smooth convex optimization.
    In optimization problems with quadratic functionals we consider functionals
which do not satisfy the usual Lipschitz property (or the Lipschitz constant is
quite large), but they have a Lipshitz-continuous gradient. For such problems
in ([2], item 3.3) the ideas of [14, 15] were adopted to construct some adaptive

     Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
    In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org
       One Mirror Descent Algorithm for Convex Constrained Optimization        373

version of Mirror Descent algorithm. For example, let Ai (i = 1, . . . , m) be a
positive-definite matrix: xT Ai x ≥ 0 ∀x and the objective function

                                  f (x) = max fi (x)
                                           1≤i≤m


for
                            1
                 fi (x) =     hAi x, xi − hbi , xi + αi , i = 1, . . . , m.
                            2

Note that such functionals appear in the Truss Topology Design problem with
weights of the bars [15].
     In this paper we propose some partial adaptive (by objective functional) ver-
sion of algorithm from ([2], item 3.3). It simplifies work with problems where the
necessity of calculating the norm of the subgradient of the functional constraint
is burdensome in view of the large number of constraints. The idea of restarts
[9] is adopted to construct the proposed algorithm in the case of strongly convex
objective and constraints. It is well-known that both considered methods are
optimal in terms of the lower bounds [12].
    Note that a functional constraint, generally, can be non-smooth. That is why
we consider subgradient methods. These methods have a long history starting
with the method for deterministic unconstrained problems and Euclidean setting
in [17] and the generalization for constrained problems in [16], where the idea
of steps switching between the direction of subgradient of the objective and the
direction of subgradient of the constraint was suggested. Non-Euclidean exten-
sion, usually referred to as Mirror Descent, originated in [10, 12] and was later
analyzed in [4]. An extension for constrained problems was proposed in [12], see
also recent version in [3]. To prove faster convergence rate of Mirror Descent for
strongly convex objective in an unconstrained case, the restart technique [11–
13] was used in [8]. Usually, the stepsize and stopping rule for Mirror Descent
requires to know the Lipschitz constant of the objective function and constraint,
if any. Adaptive stepsizes, which do not require this information, are consid-
ered in [5] for problems without inequality constraints, and in [3] for constrained
problems.
   We consider some Mirror Descent algorithms for constrained problems in the
case of non-standard growth properties of objective functional (it has a Lipshitz-
continuous gradient).
    The paper consists of Introduction and three main sections. In Section 2 we
give some basic notation concerning convex optimization problems with func-
tional constrains. In Section 3 we describe some partial adaptive version (Al-
gorithm 2) of Mirror Descent algorithm from ([2], item 3.3) and prove some
estimates for the rate of convergence of Algorithm 2. The last Section 4 is fo-
cused on the strongly convex case with restarting Algorithm 2 and corresponding
theoretical estimates for the rate of convergence.
374      F. Stonyakin, A. Titov

2      Problem Statement and Standard Mirror Descent
       Basics

Let (E, ||·||) be a normed finite-dimensional vector space and E ∗ be the conjugate
space of E with the norm:

                           ||y||∗ = max{hy, xi, ||x|| ≤ 1},
                                      x


where hy, xi is the value of the continuous linear functional y at x ∈ E.
     Let X ⊂ E be a (simple) closed convex set. We consider two convex subd-
iffirentiable functionals f and g : X → R. Also, we assume that g is Lipschitz-
continuous:

                      |g(x) − g(y)| ≤ Mg ||x − y|| ∀x, y ∈ X.                     (1)

      We focus on the next type of convex optimization problems

                                    f (x) → min,                                  (2)
                                             x∈X


                                    s.t. g(x) ≤ 0.                                (3)

    Let d : X → R be a distance generating function (d.g.f) which is continuously
differentiable and 1-strongly convex w.r.t. the norm k·k, i.e.

                  ∀x, y, ∈ X h∇d(x) − ∇d(y), x − yi ≥ kx − yk2 ,

and assume that min d(x) = d(0). Suppose we have a constant Θ0 such that
                   x∈X
d(x∗ ) ≤ Θ02 , where x∗ is a solution of (2) – (3).
   Note that if there is a set of optimal points X∗ , then we may assume that

                                   min d(x∗ ) ≤ Θ02 .
                                  x∗ ∈X∗


For all x, y ∈ X consider the corresponding Bregman divergence

                      V (x, y) = d(y) − d(x) − h∇d(x), y − xi.

Standard proximal setups, i.e. Euclidean, entropy, `1 /`2 , simplex, nuclear norm,
spectahedron can be found, e.g. in [5]. Let us define the proximal mapping op-
erator standardly

                                                   for each x ∈ X and p ∈ E ∗ .
                           
        Mirrx (p) = arg min hp, ui + V (x, u)
                       u∈X


We make the simplicity assumption, which means that Mirrx (p) is easily com-
putable.
          One Mirror Descent Algorithm for Convex Constrained Optimization    375

3      Some Mirror Descent Algorithm for the Type of
       Problems Under Consideration

Following [14], given a function f for each subgradient ∇f (x) at a point y ∈ X,
we define
                                        
                           ∇f (x)
                                    ,x − y ,     ∇f (x) 6= 0
                      
                      
          vf (x, y) =     k∇f (x)k∗                          , x ∈ X.         (4)
                      
                      0                         ∇f (x) = 0

    In ([2], item 3.3) the following adaptive Mirror Descent algorithm for Problem
(2) – (3) was proposed by the first author.


Algorithm 1 Adaptive Mirror Descent, Non-Standard Growth
Require: ε, Θ02 , X, d(·)
1: x0 = argmin d(x)
            x∈X
 2: I =: ∅
 3: N ← 0
 4: repeat
 5:   if g(xN ) ≤ ε → then
 6:      hN ← ||∇f (xεN )||∗
 7:        xN +1 ← M irrxN (hN ∇f (xN )) (”productive steps”)
 8:        N →I
 9:     else
10:        (g(xN ) > ε) →
11:        hN ← ||∇g(xεN )||2
                             ∗
12:     xN +1 ← M irrxN (hN ∇g(xN )) (”non-productive steps”)
13:   end if
14:   N ←N +1
                 2
                                ε2
15: until Θ02 ≤ ε2 |I| +
                         P
                           2||∇g(xk )||2    ∗
                            k6∈I
             N                        k
Ensure: x̄       := arg minxk ,k∈I f (x )


      For the previous method the next result was obtained in [2].
Theorem 1. Let ε > 0 be a fixed positive number and Algorithm 1 works
                              &                   '
                                2 max{1, Mg2 }Θ02
                        N=                                                    (5)
                                       ε2

steps. Then
                                     min vf (xk , x∗ ) < ε.                   (6)
                                     k∈I

      Let us remind one well-known statement (see, e.g. [5]).
376       F. Stonyakin, A. Titov

Lemma 1. Let f : X → R be a convex subdifferentiable function over the convex
set X and a sequence {xk } be defined by the following relation:

                                xk+1 = M irrxk (hk ∇f (xk )).

Then for each x ∈ X

                                     h2k
           hk h∇f (xk ), xk − xi ≤       ||∇f (xk )||2∗ + V (xk , x) − V (xk+1 , x).    (7)
                                     2

      The following Algorithm 2 is proposed by us for Problem (2) – (3).


Algorithm 2 Partial Adaptive Version of Algorithm 1
Require: ε, Θ02 , X, d(·)
1: x0 = argmin d(x)
            x∈X
 2: I =: ∅
 3: N ← 0
 4: if g(xN ) ≤ ε → then
 5:    hN ← Mg ·||∇fε(xN )||∗
 6:    xN +1 ← M irrxN (hN ∇f (xN )) (”productive steps”)
 7:    N →I
 8: else
 9:    (g(xN ) > ε) →
10:    hN ← Mε2
                  g
11:   xN +1 ← M irrxN (hN ∇g(xN )) (”non-productive steps”)
12: end if
13: N ← N + 1
Ensure: x̄N := arg minxk ,k∈I f (xk )


   Set [N ] = {k ∈ 0, N − 1}, J = [N ]/I, where I is a collection of indexes of
productive steps
                                        ε
                            hk =                  ,                         (8)
                                 Mg ||∇f (xk )||∗
and |I| is the number of ”productive steps”. Similarly, for ”non-productive” steps
from the set J the analogous variable is defined as follows:
                                                 ε
                                         hk =       ,                                   (9)
                                                Mg2

and |J| is the number of ”non-productive steps”. Obviously,

                                       |I| + |J| = N.                                  (10)

      Let us formulate the following analogue of Theorem 1.
       One Mirror Descent Algorithm for Convex Constrained Optimization                     377

Theorem 2. Let ε > 0 be a fixed positive number and Algorithm 2 works
                                  &          '
                                    2Mg2 Θ02
                             N=                                       (11)
                                       ε2

steps. Then
                                                        ε
                                  min vf (xk , x∗ ) <      .                                (12)
                                   k∈I                  Mg
Proof. 1) For the productive steps from (7), (8) one can get, that

                                     h2k
        hk h∇f (xk ), xk − xi ≤          ||∇f (xk )||2∗ + V (xk , x) − V (xk+1 , x).
                                     2
                              h2k       k 2      ε2
   Taking into account         2 ||∇f (x )||∗ = 2Mg2 , we have


                                           ∇f (xk )
                                                                 
              k     k     ε                                               ε
    hk h∇f (x ), x − xi =                              , xk − x       =      vf (xk , x).   (13)
                          Mg             ||∇f (xk )||∗                    Mg

   2) Similarly for the ”non-productive” steps k ∈ J:

                                    h2k
          hk (g(xk ) − g(x)) ≤          ||∇g(xk )||2∗ + V (xk , x) − V (xk+1 , x).
                                    2
   Using (1) and ||∇g(x)|| ≤ Mg we have

                                             ε2
                  hk (g(xk ) − g(x)) ≤           + V (xk , x) − (xk+1 , x).                 (14)
                                            2Mg2

   3) From (13) and (14) for x = x∗ we have:
                 ε X                   X ε
                       vf (xk , x∗ ) +      (g(xk ) − g(x∗ )) ≤
                Mg                      Mg2
                        k∈I                  k∈J

                                     N −1
                           ε2    X
                    ≤N         +   (V (xk , x∗ ) − V (xk+1 , x∗ )).                         (15)
                          2Mg2
                                     k=0

   Let us note that for any k ∈ J

                                g(xk ) − g(x∗ ) ≥ g(xk ) > ε

and in view of
                         N
                         X
                               (V (xk , x∗ ) − V (xk+1 , x∗ )) ≤ Θ02
                         k=1

the inequality (15) can be transformed in the following way:

                    ε X                    ε2          ε2
                        vf (xk , x∗ ) ≤ N     2
                                                + Θ02 − 2 |J|.
                    Mg                    2Mg          Mg
                         k∈I
378       F. Stonyakin, A. Titov

      On the other hand,
                        X
                                       vf (xk , x∗ ) ≥ |I| min vf (xk , x∗ ).
                                                            k∈I
                                 k∈I

      Assume that
                                    ε2                    2Mg Θ02
                                       2
                                         N ≥ Θ02 , or N ≥         .                        (16)
                                   2Mg                      ε2
Thus,
                             ε                         ε2    ε2
                       |I|      min vf (xk , x∗ ) < N      −     |J| + Θ02 ≤
                             Mg                       2Mg2   Mg2
                                           N ε2   ε2        ε2
                                       ≤        −     |J| =     |I|,
                                           Mg2    Mg2       Mg2
whence
                       ε                     ε2                           ε
                 |I|      min vf (xk , x∗ ) < 2 |I| ⇒ min vf (xk , x∗ ) <    .             (17)
                       Mg                    Mg                           Mg
   To finish the proof we should demonstrate that |I| = 6 0. Supposing the reverse
we claim that |I| = 0 ⇒ |J| = N , i.e. all the steps are non-productive, so after
using
                           g(xk ) − g(x∗ ) ≥ g(xk ) > ε
we can see, that
       N −1                                 N −1
       X                                    X       ε2              ε2       ε2      ε2
              hk (g(xk ) − g(x∗ )) ≤                   2
                                                         + Θ02 ≤ N     2
                                                                         +N     2
                                                                                  = N 2.
                                                   2Mg             2Mg      2Mg      Mg
       k=0                                   k=0

So,
                                           N −1
                                    ε X                     N ε2
                                     2
                                        (g(xk ) − g(x∗ )) ≤
                                   Mg                       Mg2
                                           k=0

and
                                            N
                                            X −1
                                  Nε <             (g(xk ) − g(x∗ )) ≤ N ε.
                                             k=0

      So, we have the contradiction. It means that |I| =
                                                       6 0.

    The following auxiliary assertion (see, e.g [14, 15]) is fullfilled (x∗ is a solution
of (2) – (3)).
Lemma 2. Let us introduce the following function:

                             ω(τ ) = max{f (x) − f (x∗ ) : ||x − x∗ || ≤ τ },              (18)
                                       x∈X

where τ is a positive number. Then for any y ∈ X

                                     f (y) − f (x∗ ) ≤ ω(vf (y, x∗ )).                     (19)
          One Mirror Descent Algorithm for Convex Constrained Optimization          379

    Now we can show, how using the previous assertion and Theorem 2, one can
estimate the rate of convergence of the Algorithm 2 if the objective function f
is differentiable and its gradient satisfies the Lipschitz condition:

                    ||∇f (x) − ∇f (y)||∗ ≤ L||x − y||            ∀x, y ∈ X.         (20)

      Using the next well-known fact
                                                             1
                f (x) ≤ f (x∗ ) + ||∇f (x∗ )||∗ ||x − x∗ || + L||x − x∗ ||2 ,
                                                             2
we can get that
                                                                              
                                                                1
        min f (xk ) − f (x∗ ) ≤ min ||∇f (x∗ )||∗ ||xk − x∗ || + L||xk − x∗ ||2 .
        k∈I                     k∈I                             2

So,
                                                  ε    Lε2        Lε2
                f (x) − f (x∗ ) ≤ ||∇f (x∗ )||∗      +     = εf +     ,
                                                  Mg   2Mg        2Mg
where
                                                          ε
                                  εf = ||∇f (x∗ )||∗         .
                                                          Mg
      That is why the following result holds.

Corollary 1. If f is differentiable on X and (20) holds. Then after
                                     &          '
                                       2Mg2 Θ02
                                N=
                                         ε2

steps of Algorithm 2 working the next estimate can be fulfilled:

                                                                 L ε2
                           min f (xk ) − f (x∗ ) ≤ εf +                ,
                         0≤k≤N                                   2 Mg2

where
                                                          ε
                                  εf = ||∇f (x∗ )||∗         .
                                                          Mg
   We can apply our method to some class of problems with a special class of
non-smooth objective functionals.

Corollary 2. Assume that f (x) = max fi (x), where fi is differentiable at each
                                         i=1,m
x ∈ X and
                   ||∇fi (x) − ∇fi (y)||∗ ≤ Li ||x − y||          ∀x, y ∈ X.
Then after                                 &              '
                                               2Mg2 Θ02
                                     N=
                                                 ε2
380     F. Stonyakin, A. Titov

steps of Algorithm 2 working the next estimate can be fulfilled:

                                                        L ε2
                        min f (xk ) − f (x∗ ) ≤ εf +          ,
                       0≤k≤N                            2 Mg2

where
                                           ε
                      εf = ||∇f (x∗ )||∗      ,   L = max Li .
                                           Mg          i=1,m

Remark 1. Generally, ||∇f (x∗ )||∗ 6= 0, because we consider some class of con-
strained problems.


4     On the Technique of Restarts in the Considered Version
      of Mirror Descent for Strongly Convex Problems

In this subsection, we consider problem

                         f (x) → min, g(x) ≤ 0, x ∈ X                           (21)

with assumption (1) and additional assumption of strong convexity of f and g
with the same parameter µ, i.e.,
                                                  µ
             f (y) ≥ f (x) + h∇f (x), y − xi +      ky − xk2 ,    x, y ∈ X
                                                  2
and the same holds for g. We also slightly modify assumptions on prox-function
d(x). Namely, we assume that 0 = arg minx∈X d(x) and that d is bounded on
the unit ball in the chosen norm k · kE , that is

                                     Ω
                            d(x) ≤       ∀x ∈ X : kxk ≤ 1,                      (22)
                                     2
where Ω is some known number. Finally, we assume that we are given a starting
point x0 ∈ X and a number R0 > 0 such that kx0 − x∗ k2 ≤ R02 .
    To construct a method for solving problem (21) under stated assumptions, we
use the idea of restarting Algorithm 2. The idea of restarting a method for convex
problems to obtain faster rate of convergence for strongly convex problems dates
back to 1980’s, see e.g. [12, 13]. To show that restarting algorithm is also possible
for problems with inequality constraints, we rely on the following Lemma (see,
e.g. [1]).
Lemma 3. If f and g are µ-strongly convex functionals with respect to the norm
k · k on X, x∗ = arg min f (x), g(x) ≤ 0 (∀x ∈ X) and εf > 0 and εg > 0:
                      x∈X

                          f (x) − f (x∗ ) ≤ εf , g(x) ≤ εg .                    (23)

Then
                             µ
                               kx − x∗ k2 ≤ max{εf , εg }.                      (24)
                             2
         One Mirror Descent Algorithm for Convex Constrained Optimization          381

In conditions of Corollary 2, after Algorithm 2 stops the inequalities will be true
(23) for
                                  ε                ε2 L
                          εf =       k∇f (x∗ )k∗ +
                                 Mg                2Mg2
and εg = ε. Consider the function τ : R+ → R+ :

                                              δ2 L
                                                       
                  τ (δ) = max δk∇f (x∗ )k∗ +       ; δMg .
                                                2

It is clear that τ increases and therefore for each ε > 0 there exists

                               ϕ(ε) > 0 : τ (ϕ(ε)) = ε.

Remark 2. ϕ(ε) depends on k∇f (x∗ )k∗ and Lipschitz constant L for ∇f . If
k∇f (x∗ )k∗ < Mg then ϕ(ε) = ε for small enough ε:

                                    2(Mg − k∇f (x∗ )k∗ )
                               ε<                        .
                                            L
For another case (k∇f (x∗ )k∗ > Mg ) we have ∀ε > 0:
                           p
                             k∇f (x∗ )k2∗ + 2εL − k∇f (x∗ )k∗
                   ϕ(ε) =                                     .
                                            L
Let us consider the following Algorithm 3 for the problem (21).


Algorithm 3 Algorithm for the Strongly Convex Problem
Require: accuracy ε > 0; strong convexity parameter µ; Θ02 s.t. d(x) ≤ Θ02    ∀x ∈ X :
                                                          2       2
   kxk ≤ 1; starting   x0 and number R0 s.t. kx0 − x∗ k ≤ R0 .
                  point
1: Set d0 (x) = d x−xR0
                        0
                           .
2: Set p = 1.
3: repeat
4:   Set Rp2 = R02 · 2−p .
                µR2
5:    Set εp = 2 p .
6:    Set xp as the output of Algorithm 2 with accuracy εp , prox-function dp−1 (·) and
      Θ02 .             
                   x−x
 7:   dp (x) ← d Rp p .
 8:   Set p = p + 1.
                   µR2
 9: until p > log2 2ε0 .
Ensure: xp .


     The following theorem is fulfilled.
Theorem 3. Let f and g satisfy Corollary 2. If f, g are µ-strongly convex func-
tionals on X ⊂ Rn and d(x) ≤ θ02 ∀ x ∈ X, kxk ≤ 1. Let the starting point
382       F. Stonyakin, A. Titov

                                                             2     2
x0 ∈ X and the number R0 > 0 be given and kx0 − x∗ k ≤ R0 . Then for
             2
          µR0
pb = log2        xpb is the ε-solution of Problem (2) – (3), where
           2ε
                                                      2ε
                                     kxpb − x∗ k2 ≤      .
                                                      µ
At the same time, the total number of iterations of Algorithm 2 does not exceed
                                p
                                  2θ2 M g 2                  µR02
                                b
                                X
                                     0
                         pb +        2 (ε )
                                            ,   where εp =        .
                                p=1
                                    ϕ    p                   2p+1

Proof. The function dp (x) (p = 0, 1, 2, . . .) is 1-strongly convex with respect to
         k·k
the norm     . By the method of mathematical induction, we show that
          Rp
                                 kxpb − x∗ k2 ≤ Rp ∀p ≥ 0.
For p = 0 this assertion is obvious by virtue of the choice of x0 and R0 . Suppose
that for some p: kxp − x∗ k2 ≤ Rp2 . Let us prove that kxp+1 − x∗ k2 ≤ Rp+12
                                                                              . We
have dp (x∗ ) ≤ θ02 and on (p + 1)-th restart after no more than
                                    &            '
                                       2θ02 Mg2
                                      ϕ2 (εp+1 )
iterations of Algorithm 2 the following inequalities are true:
                                                                        2
                                                                      µRp+1
          f (xp+1 ) − f (x∗ ) ≤ εp+1 ,   g(xp+1 ) ≤ εp+1 for εp+1 =         .
                                                                        2
Then, according to Lemma 3
                                                2εp+1
                            kxp+1 − x∗ k2 ≤              2
                                                      = Rp+1 .
                                                  µ
So, for all p ≥ 0
                          R2                       µR02 −p           µR02 −p
      kxp − x∗ k2 ≤ Rp2 = p0 , f (xp ) − f (x∗ ) ≤      2 , g(xp ) ≤     2 .
                          2                         2                 2
                        µR02
                            
      For p = pb = log2        the following relation is true
                         2ε
                                                              2ε
                          kxp − x∗ k2 ≤ Rp2 = R02 · 2−p ≤        .
                                                              µ
    It remains only to note that the number of iterations of the work of Algorithm
2 is no more than
                        p                        p
                           &            '
                              2θ02 Mg2               2θ02 Mg2
                       Xb                       Xb
                                          ≤ p +                .
                             ϕ2 (εp+1 )             ϕ2 (εp+1 )
                                            b
                       p=1                      p=1
        One Mirror Descent Algorithm for Convex Constrained Optimization            383

Acknowledgement. The authors are very grateful to Yurii Nesterov, Alexander
Gasnikov and Pavel Dvurechensky for fruitful discussions. The authors would like
to thank the unknown reviewers for useful comments and suggestions.
    The research by Fedor Stonyakin and Alexander Titov (Algorithm 3, Theo-
rem 3, Corollary 1 and Corollary 2) presented was partially supported by Russian
Foundation for Basic Research according to the research project 18-31-00219.
The research by Fedor Stonyakin (Algorithm 2 and Theorem 2) presented was
partially supported by the grant of the President of the Russian Federation for
young candidates of sciences, project no. MK-176.2017.1.


References

 1. Bayandina, A., Gasnikov, A., Gasnikova, E., Matsievsky, S.: Primal-dual mir-
    ror descent for the stochastic programming problems with functional constraints.
    Computational Mathematics and Mathematical Physics. (accepted) (2018). https:
    //arxiv.org/pdf/1604.08194.pdf, (in Russian)
 2. Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror
    descent and convex optimization problems with non-smooth inequality constraints.
    In: LCCC Focus Period on Large-Scale and Distributed Optimization. Sweden,
    Lund: Springer. (accepted) (2017). https://arxiv.org/abs/1710.06612
 3. Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algo-
    rithm for solving nonsmooth constrained convex problems. Operations Research
    Letters 38(6), 493–498 (2010)
 4. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient meth-
    ods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
 5. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. Society
    for Industrial and Applied Mathematics, Philadelphia (2001)
 6. Ben-Tal, A., Nemirovski, A.: Robust Truss Topology Design via semidefinite pro-
    gramming. SIAM Journal on Optimization 7(4), 991–1016 (1997)
 7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
    New York (2004)
 8. Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex large-
    scale optimization. I: general purpose methods. Optimization for Machine Learn-
    ing, S. Sra et al (eds.), 121–184. Cambridge, MA: MIT Press (2012)
 9. Juditsky, A., Nesterov, Y.: Deterministic and stochastic primal-gual subgradient al-
    gorithms for uniformly convex minimization. Stochastic Systems 4(1), 44–80 (2014)
10. Nemirovskii, A.: Efficient methods for large-scale convex optimization problems.
    Ekonomika i Matematicheskie Metody (1979), (in Russian)
11. Nemirovskii, A., Nesterov, Y.: Optimal methods of smooth convex minimization.
    USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985),
    (in Russian)
12. Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Opti-
    mization. J. Wiley & Sons, New York (1983)
13. Nesterov, Y.: A method of solving a convex programming problem with conver-
    gence rate O(1/k2 ). Soviet Mathematics Doklady 27(2), 372–376 (1983)
14. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course.
    Kluwer Academic Publishers, Massachusetts (2004)
384     F. Stonyakin, A. Titov

15. Nesterov, Y.: Subgradient methods for convex functions with nonstan-
    dard growth properties: https://www.mathnet.ru:8080/PresentFiles/16179/
    growthbm_nesterov.pdf, [Online; accessed 01-April-2018]
16. Polyak, B.: A general method of solving extremum problems. Soviet Mathematics
    Doklady 8(3), 593–597 (1967), (in Russian)
17. Shor, N. Z.: Generalized gradient descent with application to block programming.
    Kibernetika 3(3), 53–55 (1967), (in Russian)
18. Shpirko, S., Nesterov Y.: Primal-dual subgradient methods for huge-scale linear
    conic problem. SIAM Journal on Optimization 24 (3), 1444 – 1457 (2014)
19. Vasilyev, F.: Optimization Methods. Fizmatlit, Moscow (2002), (in Russian)