<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fedor S. Stonyakin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander A. Titov</string-name>
          <email>a.a.titov@phystech.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Moscow Institute of Physics and Technologies</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>V. I. Vernadsky Crimean Federal University</institution>
          ,
          <addr-line>Simferopol</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>372</fpage>
      <lpage>384</lpage>
      <abstract>
        <p>The paper is devoted to a special Mirror Descent algorithm for problems of convex minimization with functional constraints. The objective function may not satisfy the Lipschitz condition, but it must necessarily have a Lipshitz-continuous gradient. We assume, that the functional constraint can be non-smooth, but satisfying the Lipschitz condition. In particular, such functionals appear in the well-known Truss Topology Design problem. Also, we have applied the technique of restarts in the mentioned version of Mirror Descent for strongly convex problems. Some estimations for the rate of convergence are investigated for the Mirror Descent algorithms under consideration.</p>
      </abstract>
      <kwd-group>
        <kwd>Adaptive Mirror Descent algorithm gradient</kwd>
        <kwd>Technique of restarts</kwd>
        <kwd>Lipshitz-continuous</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The optimization of non-smooth functionals with constraints attracts widespread
interest in large-scale optimization and its applications [
        <xref ref-type="bibr" rid="ref18 ref6">6, 18</xref>
        ]. There are various
methods of solving this kind of optimization problems. Some examples of these
methods are: bundle-level method [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], penalty method [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], Lagrange
multipliers method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Among them, Mirror Descent (MD) [
        <xref ref-type="bibr" rid="ref12 ref4">4, 12</xref>
        ] is viewed as a simple
method for non-smooth convex optimization.
      </p>
      <p>
        In optimization problems with quadratic functionals we consider functionals
which do not satisfy the usual Lipschitz property (or the Lipschitz constant is
quite large), but they have a Lipshitz-continuous gradient. For such problems
in ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], item 3.3) the ideas of [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] were adopted to construct some adaptive
Copyright c by the paper's authors. Copying permitted for private and academic purposes.
In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org
version of Mirror Descent algorithm. For example, let Ai (i = 1; : : : ; m) be a
positive-de nite matrix: xT Aix 0 8x and the objective function
f (x) = 1miaxm fi(x)
for
fi(x) =
1
2 hAix; xi
      </p>
      <p>
        hbi; xi + i; i = 1; : : : ; m:
Note that such functionals appear in the Truss Topology Design problem with
weights of the bars [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        In this paper we propose some partial adaptive (by objective functional)
version of algorithm from ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], item 3.3). It simpli es work with problems where the
necessity of calculating the norm of the subgradient of the functional constraint
is burdensome in view of the large number of constraints. The idea of restarts
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is adopted to construct the proposed algorithm in the case of strongly convex
objective and constraints. It is well-known that both considered methods are
optimal in terms of the lower bounds [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Note that a functional constraint, generally, can be non-smooth. That is why
we consider subgradient methods. These methods have a long history starting
with the method for deterministic unconstrained problems and Euclidean setting
in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and the generalization for constrained problems in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], where the idea
of steps switching between the direction of subgradient of the objective and the
direction of subgradient of the constraint was suggested. Non-Euclidean
extension, usually referred to as Mirror Descent, originated in [
        <xref ref-type="bibr" rid="ref10 ref12">10, 12</xref>
        ] and was later
analyzed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. An extension for constrained problems was proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], see
also recent version in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To prove faster convergence rate of Mirror Descent for
strongly convex objective in an unconstrained case, the restart technique [11{
13] was used in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Usually, the stepsize and stopping rule for Mirror Descent
requires to know the Lipschitz constant of the objective function and constraint,
if any. Adaptive stepsizes, which do not require this information, are
considered in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for problems without inequality constraints, and in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for constrained
problems.
      </p>
      <p>We consider some Mirror Descent algorithms for constrained problems in the
case of non-standard growth properties of objective functional (it has a
Lipshitzcontinuous gradient).</p>
      <p>
        The paper consists of Introduction and three main sections. In Section 2 we
give some basic notation concerning convex optimization problems with
functional constrains. In Section 3 we describe some partial adaptive version
(Algorithm 2) of Mirror Descent algorithm from ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], item 3.3) and prove some
estimates for the rate of convergence of Algorithm 2. The last Section 4 is
focused on the strongly convex case with restarting Algorithm 2 and corresponding
theoretical estimates for the rate of convergence.
      </p>
      <p>Problem Statement and Standard Mirror Descent
Basics
Let (E; jj jj) be a normed nite-dimensional vector space and E be the conjugate
space of E with the norm:
jjyjj = mxaxfhy; xi; jjxjj
1g;
where hy; xi is the value of the continuous linear functional y at x 2 E.</p>
      <p>Let X E be a (simple) closed convex set. We consider two convex
subdi rentiable functionals f and g : X ! R. Also, we assume that g is
Lipschitzcontinuous:
jg(x) g(y)j</p>
    </sec>
    <sec id="sec-2">
      <title>Mgjjx</title>
      <p>f (x) ! xm2iXn;
s:t: g(x)
0:
(1)
(2)
(3)</p>
      <p>Let d : X ! R be a distance generating function (d.g.f) which is continuously
di erentiable and 1-strongly convex w.r.t. the norm k k, i.e.</p>
      <p>8x; y; 2 X hrd(x)
rd(y); x
yi
kx
yk2;
and assume that min d(x) = d(0): Suppose we have a constant 0 such that
x2X
d(x ) 02; where x is a solution of (2) { (3).</p>
      <p>
        Note that if there is a set of optimal points X , then we may assume that
min d(x )
x 2X
02:
For all x; y 2 X consider the corresponding Bregman divergence
V (x; y) = d(y)
d(x)
hrd(x); y
xi:
Standard proximal setups, i.e. Euclidean, entropy, `1=`2, simplex, nuclear norm,
spectahedron can be found, e.g. in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Let us de ne the proximal mapping
operator standardly
      </p>
      <p>Mirrx(p) = arg um2iXn hp; ui + V (x; u) for each x 2 X and p 2 E :
We make the simplicity assumption, which means that Mirrx(p) is easily
computable.</p>
      <p>
        Some Mirror Descent Algorithm for the Type of
Problems Under Consideration
Following [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], given a function f for each subgradient rf (x) at a point y 2 X,
we de ne
vf (x; y) =
8
&gt;
&lt;
&gt;:0
      </p>
      <p>rf (x)
krf (x)k
; x
y ;
rf (x) 6= 0
rf (x) = 0
;
x 2 X:
(4)</p>
      <p>
        In ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], item 3.3) the following adaptive Mirror Descent algorithm for Problem
(2) { (3) was proposed by the rst author.
      </p>
      <sec id="sec-2-1">
        <title>Algorithm 1 Adaptive Mirror Descent, Non-Standard Growth</title>
        <p>Ensure: xN := arg minxk;k2I f (xk)</p>
        <p>" ! then
hN "
xN+1 jjrf(xN )jj</p>
        <p>M irrxN (hN rf (xN )) ("productive steps")
N ! I
else
(g(xN ) &gt; ") !
hN "
xN+1 jjrg(xN )jj2</p>
        <p>M irrxN (hN rg(xN )) ("non-productive steps")</p>
      </sec>
      <sec id="sec-2-2">
        <title>For the previous method the next result was obtained in [2].</title>
        <p>Theorem 1. Let " &gt; 0 be a xed positive number and Algorithm 1 works
steps. Then</p>
      </sec>
      <sec id="sec-2-3">
        <title>Let us remind one well-known statement (see, e.g. [5]).</title>
        <p>N =
&amp; 2 maxf1; Mg2g 02 '</p>
        <p>"2
min vf (xk; x ) &lt; ":
k2I
(5)
(6)
Lemma 1. Let f : X ! R be a convex subdi erentiable function over the convex
set X and a sequence fxkg be de ned by the following relation:</p>
        <p>xk+1 = M irrxk (hkrf (xk)):
Then for each x 2 X</p>
        <p>M irrxN (hN rg(xN )) ("non-productive steps")
and jIj is the number of "productive steps". Similarly, for "non-productive" steps
from the set J the analogous variable is de ned as follows:
(8)
(9)
(10)
1g; J = [N ]=I, where I is a collection of indexes of
hk =</p>
        <p>"</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Mgjjrf (xk)jj</title>
      <p>;
hk =</p>
      <p>"
Mg2</p>
      <p>;
jIj + jJ j = N:
and jJ j is the number of "non-productive steps". Obviously,</p>
      <sec id="sec-3-1">
        <title>Let us formulate the following analogue of Theorem 1.</title>
        <p>N =</p>
        <sec id="sec-3-1-1">
          <title>2) Similarly for the "non-productive" steps k 2 J :</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Using (1) and jjrg(x)jj</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Mg we have</title>
        <p>hk(g(xk) g(x))
h2k
2 jjrg(xk)jj2 + V (xk; x)
Theorem 2. Let " &gt; 0 be a xed positive number and Algorithm 2 works
3) From (13) and (14) for x = x we have:
"</p>
        <p>X vf (xk; x ) + X "
Mg k2I k2J Mg2 (g(xk) g(x ))</p>
        <p>N 2M"2g2 + NX1(V (xk; x )</p>
        <p>k=0</p>
        <sec id="sec-3-2-1">
          <title>Let us note that for any k 2 J</title>
          <p>V (xk+1; x )):
and in view of</p>
          <p>N
X(V (xk; x ) V (xk+1; x )) 02
k=1
the inequality (15) can be transformed in the following way:
"</p>
          <p>X vf (xk; x )
Mg k2I</p>
          <p>N 2M"2g2 +</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>On the other hand,</title>
      </sec>
      <sec id="sec-3-4">
        <title>Assume that</title>
        <p>X vf (xk; x )
k2I</p>
        <p>jIj mk2inI vf (xk; x ):
"2
2Mg2</p>
        <p>N
02; or N
jIj M"g min vf (xk; x ) &lt; M"2g2 jIj ) min vf (xk; x ) &lt;
"
Mg
:</p>
        <p>To nish the proof we should demonstrate that jIj 6= 0. Supposing the reverse
we claim that jIj = 0 ) jJ j = N , i.e. all the steps are non-productive, so after
using
(17)
(18)
So, we have the contradiction. It means that jIj 6= 0.</p>
        <p>
          The following auxiliary assertion (see, e.g [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]) is full lled (x is a solution
of (2) { (3)).
        </p>
        <p>Lemma 2. Let us introduce the following function:
where is a positive number. Then for any y 2 X
!( ) = maxff (x) f (x ) : jjx
x2X
x jj</p>
        <p>g;
f (y) f (x )</p>
        <p>!(vf (y; x )):
steps of Algorithm 2 working the next estimate can be ful lled:</p>
        <p>Now we can show, how using the previous assertion and Theorem 2, one can
estimate the rate of convergence of the Algorithm 2 if the objective function f
is di erentiable and its gradient satis es the Lipschitz condition:
jjrf (x)
rf (y)jj</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Ljjx</title>
      <p>where
where
x 2 X and
Then after
"f +
steps of Algorithm 2 working the next estimate can be ful lled:
"f +
where
Remark 1. Generally, jjrf (x )jj 6= 0, because we consider some class of
constrained problems.
4</p>
      <p>On the Technique of Restarts in the Considered Version
of Mirror Descent for Strongly Convex Problems
In this subsection, we consider problem
f (x) ! min; g(x)
0; x 2 X
with assumption (1) and additional assumption of strong convexity of f and g
with the same parameter , i.e.,
f (y)
f (x) + hrf (x); y
xi + 2 ky
xk2; x; y 2 X
and the same holds for g. We also slightly modify assumptions on prox-function
d(x). Namely, we assume that 0 = arg minx2X d(x) and that d is bounded on
the unit ball in the chosen norm k kE, that is
d(x)
2
8x 2 X : kxk
1;
where is some known number. Finally, we assume that we are given a starting
point x0 2 X and a number R0 &gt; 0 such that kx0 x k2 R02.</p>
      <p>
        To construct a method for solving problem (21) under stated assumptions, we
use the idea of restarting Algorithm 2. The idea of restarting a method for convex
problems to obtain faster rate of convergence for strongly convex problems dates
back to 1980's, see e.g. [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. To show that restarting algorithm is also possible
for problems with inequality constraints, we rely on the following Lemma (see,
e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
      </p>
      <p>Lemma 3. If f and g are -strongly convex functionals with respect to the norm
k k on X, x = arg min f (x), g(x) 0 (8x 2 X) and "f &gt; 0 and "g &gt; 0:
x2X
Then
In conditions of Corollary 2, after Algorithm 2 stops the inequalities will be true
(23) for
"</p>
      <p>Mg
"f =</p>
      <p>krf (x )k +
and "g = ". Consider the function</p>
      <p>: R+ ! R+:
( ) = max
krf (x )k +</p>
      <p>Mg :</p>
      <sec id="sec-4-1">
        <title>It is clear that increases and therefore for each " &gt; 0 there exists '(") &gt; 0 : ('(")) = ":</title>
        <p>Remark 2. '(") depends on krf (x )k and Lipschitz constant L for rf . If
krf (x )k &lt; Mg then '(") = " for small enough ":
"2L
2Mg2
2L
2
;</p>
        <sec id="sec-4-1-1">
          <title>For another case (krf (x )k &gt; Mg) we have 8" &gt; 0:</title>
          <p>" &lt;
2(Mg
krf (x )k ) :
L
'(") =
pkrf (x )k2 + 2"L
L
krf (x )k :</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Let us consider the following Algorithm 3 for the problem (21).</title>
      </sec>
      <sec id="sec-4-3">
        <title>Algorithm 3 Algorithm for the Strongly Convex Problem</title>
        <p>Require: accuracy " &gt; 0; strong convexity parameter ; 02 s.t. d(x)
kxk 1; starting point x0 and number R0 s.t. kx0 x k2 R02.
1: Set d0(x) = d x x0 .</p>
        <p>R0
2
0 8x 2 X :
2: Set p = 1.
3: repeat
4: Set Rp2 = R02 2 p.</p>
        <p>R2p2 .
9: until p &gt; log2 2R"02 .</p>
        <p>Ensure: xp.</p>
      </sec>
      <sec id="sec-4-4">
        <title>The following theorem is ful lled.</title>
        <p>Theorem 3. Let f and g satisfy Corollary 2. If f; g are -strongly convex
functionals on X Rn and d(x) 02 8 x 2 X; kxk 1. Let the starting point
x0 2 X and the number R0 &gt; 0 be given and kx0</p>
        <p>R02
p = log2 2"
b
xp is the "-solution of Problem (2) { (3), where
b
At the same time, the total number of iterations of Algorithm 2 does not exceed
kxpb
For p = 0 this assertion is obvious by virtue of the choice of x0 and R0. Suppose
that for some p: kxp x k2 Rp2. Let us prove that kxp+1 x k2 Rp2+1. We
have dp(x ) 02 and on (p + 1)-th restart after no more than
iterations of Algorithm 2 the following inequalities are true:
f (xp+1) f (x )
"p+1; g(xp+1)
"p+1 for "p+1 =</p>
        <p>Rp2+1 :
2</p>
      </sec>
      <sec id="sec-4-5">
        <title>Then, according to Lemma 3 So, for all p 0</title>
        <p>kxp+1
x k2</p>
        <p>2"p+1 = Rp2+1:
kxp
x k2</p>
        <p>Rp2 =</p>
        <p>; f (xp) f (x )
R02
2p</p>
        <p>R02 2 p; g(xp)
2</p>
        <p>R02 2 p:
2</p>
        <p>R02 the following relation is true
For p = pb = log2 2"
kxp
x k2</p>
        <p>Rp2 = R02 2 p
2"
:</p>
        <p>It remains only to note that the number of iterations of the work of Algorithm</p>
      </sec>
      <sec id="sec-4-6">
        <title>2 is no more than</title>
        <p>'2("p+1)</p>
        <p>p
p + Xb 2 02Mg2 :
b
Acknowledgement. The authors are very grateful to Yurii Nesterov, Alexander
Gasnikov and Pavel Dvurechensky for fruitful discussions. The authors would like
to thank the unknown reviewers for useful comments and suggestions.</p>
        <p>The research by Fedor Stonyakin and Alexander Titov (Algorithm 3,
Theorem 3, Corollary 1 and Corollary 2) presented was partially supported by Russian
Foundation for Basic Research according to the research project 18-31-00219.
The research by Fedor Stonyakin (Algorithm 2 and Theorem 2) presented was
partially supported by the grant of the President of the Russian Federation for
young candidates of sciences, project no. MK-176.2017.1.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bayandina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasnikov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasnikova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsievsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Primal-dual mirror descent for the stochastic programming problems with functional constraints</article-title>
          .
          <source>Computational Mathematics and Mathematical Physics. (accepted)</source>
          (
          <year>2018</year>
          ). https: //arxiv.org/pdf/1604.08194.pdf, (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bayandina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dvurechensky</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasnikov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stonyakin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Titov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mirror descent and convex optimization problems with non-smooth inequality constraints</article-title>
          .
          <source>In: LCCC Focus Period on Large-Scale and Distributed Optimization</source>
          . Sweden, Lund: Springer. (accepted) (
          <year>2017</year>
          ). https://arxiv.org/abs/1710.06612
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-Tal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guttmann-Beck</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tetruashvili</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The comirror algorithm for solving nonsmooth constrained convex problems</article-title>
          .
          <source>Operations Research Letters</source>
          <volume>38</volume>
          (
          <issue>6</issue>
          ),
          <volume>493</volume>
          {
          <fpage>498</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teboulle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Mirror descent and nonlinear projected subgradient methods for convex optimization</article-title>
          .
          <source>Oper. Res. Lett</source>
          .
          <volume>31</volume>
          (
          <issue>3</issue>
          ),
          <volume>167</volume>
          {
          <fpage>175</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ben-Tal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemirovski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics</source>
          , Philadelphia (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ben-Tal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemirovski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Robust Truss Topology Design via semide nite programming</article-title>
          .
          <source>SIAM Journal on Optimization</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <volume>991</volume>
          {
          <fpage>1016</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Boyd</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandenberghe</surname>
            ,
            <given-names>L.: Convex</given-names>
          </string-name>
          <string-name>
            <surname>Optimization</surname>
          </string-name>
          . Cambridge University Press, New York (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Juditsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemirovski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>First order methods for non-smooth convex largescale optimization. I: general purpose methods</article-title>
          .
          <source>Optimization for Machine Learning, S. Sra et al (eds.)</source>
          ,
          <volume>121</volume>
          {
          <fpage>184</fpage>
          . Cambridge, MA: MIT Press (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Juditsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nesterov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deterministic and stochastic primal-gual subgradient algorithms for uniformly convex minimization</article-title>
          .
          <source>Stochastic Systems</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <volume>44</volume>
          {
          <fpage>80</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nemirovskii</surname>
          </string-name>
          , A.:
          <article-title>E cient methods for large-scale convex optimization problems</article-title>
          .
          <source>Ekonomika i Matematicheskie Metody</source>
          (
          <year>1979</year>
          ), (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nemirovskii</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nesterov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Optimal methods of smooth convex minimization</article-title>
          .
          <source>USSR Computational Mathematics and Mathematical Physics</source>
          <volume>25</volume>
          (
          <issue>2</issue>
          ),
          <volume>21</volume>
          {
          <fpage>30</fpage>
          (
          <year>1985</year>
          ), (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nemirovsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yudin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Problem Complexity and Method E ciency in Optimization</article-title>
          . J. Wiley &amp; Sons, New York (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nesterov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A method of solving a convex programming problem with convergence rate O(1=k2)</article-title>
          .
          <source>Soviet Mathematics Doklady</source>
          <volume>27</volume>
          (
          <issue>2</issue>
          ),
          <volume>372</volume>
          {
          <fpage>376</fpage>
          (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Nesterov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          : Introductory Lectures on Convex Optimization:
          <string-name>
            <given-names>A Basic</given-names>
            <surname>Course</surname>
          </string-name>
          . Kluwer Academic Publishers, Massachusetts (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Nesterov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Subgradient methods for convex functions with nonstandard growth properties</article-title>
          : https://www.mathnet.ru:8080/PresentFiles/16179/ growthbm_nesterov.pdf, [Online; accessed 01-April-2018]
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Polyak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A general method of solving extremum problems</article-title>
          .
          <source>Soviet Mathematics Doklady</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>593</volume>
          {
          <fpage>597</fpage>
          (
          <year>1967</year>
          ), (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Shor</surname>
            ,
            <given-names>N. Z.</given-names>
          </string-name>
          :
          <article-title>Generalized gradient descent with application to block programming</article-title>
          .
          <source>Kibernetika</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ),
          <volume>53</volume>
          {
          <fpage>55</fpage>
          (
          <year>1967</year>
          ), (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Shpirko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nesterov</surname>
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Primal-dual subgradient methods for huge-scale linear conic problem</article-title>
          .
          <source>SIAM Journal on Optimization</source>
          <volume>24</volume>
          (
          <issue>3</issue>
          ),
          <volume>1444</volume>
          {
          <fpage>1457</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vasilyev</surname>
            ,
            <given-names>F.: Optimization</given-names>
          </string-name>
          <string-name>
            <surname>Methods. Fizmatlit</surname>
          </string-name>
          , Moscow (
          <year>2002</year>
          ), (in Russian)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>