=Paper= {{Paper |id=Vol-1623/paperme5 |storemode=property |title=Primal-Dual Method for Searching Equilibrium in Hierarchical Congestion Population Games |pdfUrl=https://ceur-ws.org/Vol-1623/paperme5.pdf |volume=Vol-1623 |authors=Pavel Dvurechensky,Alexander Gasnikov,Evgenia Gasnikova,Sergey Matsievsky,Anton Rodomanov,Inna Usik |dblpUrl=https://dblp.org/rec/conf/door/DvurechenskyGGM16 }} ==Primal-Dual Method for Searching Equilibrium in Hierarchical Congestion Population Games== https://ceur-ws.org/Vol-1623/paperme5.pdf
    Primal-Dual Method for Searching Equilibrium
     in Hierarchical Congestion Population Games

    Pavel Dvurechensky1,3 , Alexander Gasnikov2,3 , Evgenia Gasnikova2 , Sergey
                Matsievsky4 , Anton Rodomanov5 , and Inna Usik4
1
  Weierstrass Institute for Applied Analysis and Stochastics, Berlin 10117, Germany,
                         pavel.dvurechensky@wias-berlin.de
                    2
                       Moscow Institute of Physics and Technology,
                     Dolgoprudnyi 141700, Moscow Oblast, Russia,
                     gasnikov@yandex.ru, egasnikova@yandex.ru
    3
       Institute for Information Transmission Problems, Moscow 127051, Russia,
      4
         Immanuel Kant Baltic Federal University, Kaliningrad 236041, Russia,
                      matsievsky@newmail.ru, lavinija@mail.ru
5
  Higher School of Economics National Research University, Moscow 125319, Russia
                             anton.rodomanov@gmail.com



        Abstract. In this paper, we consider a large class of hierarchical conges-
        tion population games. One can show that the equilibrium in a game of
        such type can be described as a minimum point in a properly constructed
        multi-level convex optimization problem. We propose a fast primal-dual
        composite gradient method and apply it to the problem, which is dual to
        the problem describing the equilibrium in the considered class of games.
        We prove that this method allows to find an approximate solution of the
        initial problem without increasing the complexity.

        Keywords: convex optimization, algorithm complexity, dual problem,
        primal-dual method, logit dynamics, multistage model of traﬃc flows,
        entropy, equilibrium


1     Problem Statement

In this subsection, we briefly describe a variational principle for equilibrium de-
scription in hierarchical congestion population games. In particular, we consider
a multistage model of traﬃc flows. Further details can be found in [1]. ⟨               ⟩
      We consider the traﬃc network described by the directed graph Γ 1 = V 1 , E 1 .
Some of its vertices O1 ⊆ V 1 are sources (origins), and some are sinks (destina-
tions) D1 ⊆ V 1 . We denote a set of source-sink pairs by OD1 ⊆ O1 ⊗ D1 . Let
us assume that for each pair w1 ∈ OD1 there is a flow of network users of the
amount of d1w1 := d1w1 · M , where M ≫ 1, per unit time who moves from the
origin of w1 to its destination. We call the pair w1 , d1w1 as correspondence.
                                                                    ⨿
      Let edges Γ 1 be partitioned into two types E 1 = Ẽ 1 Ē 1 . The edges of
type Ẽ 1 are characterized by non-decreasing functions of expenses τe11 (fe11 ) :=
τe11 (fe11 /M ). Expenses τe11 (fe11 ) are incurred by those users who use in their path
                       Searching Equilibrium in Hierarchical Population Games               585

an edge e1 ∈ Ẽ 1 , the flow of users on this edge being equal to fe11 . The pairs
of vertices setting the edges of type Ē 1 are in turn a source-sink pairs OD2
                       ⟨ 2 dw22⟩ = fe1 , w = e ∈ Ē1 ) in a traﬃc network of the
                             2       1      2       1
(with correspondences
                  2
second level Γ = V , E whose edges are partitioned in turn into two types
          ⨿
E 2 = Ẽ 2 Ē 2 . The edges having type Ẽ 2 are characterized by non-decreasing
functions of expenses τe22 (fe22 ) := τe22 (fe22 /M ). Expenses τe22 (fe22 ) are incurred by
those users who use in their path an edge e2 ∈ Ẽ 2 , the flow of users on this edge
being equal to fe22 .
    The pairs of vertices setting the edges having type Ē 2 are in turn source-sink
                          ⟨ 3 3 ⟩ dw3 = fe2 , w = e ∈ Ē ) in a traﬃc network
pairs OD3 (with correspondences          3        2     3   2      2
                      3
of a higher level Γ = V , E , etc. We assume that in total there are m levels:
Ẽ m = E m . Usually, in applications, the number m is small and varies from 2 to
10.
    Let Pw1 1 be the set of all paths in Γ 1 which correspond to a correspondence
w . Each user in the graph Γ 1 chooses a path p1w1 ∈ Pw1 1 (a consecutive set of
  1

the edges passed by the user) corresponding to his correspondence w1 ∈ OD1 .
Having defined a path p1w1 , it is possible to restore unambiguously the edges
having type Ē 1 which belong to this path. On each of these edges w2 ∈ Ē 1 ,
user can choose a path p2w2 ∈ Pw2 2 (Pw2 2 is a set of all paths corresponding in
the graph Γ 2 to the correspondence w2 ), etc. Let us assume that each user have
made the choice.
    We denote by x1p1 the size of the flow of users on a path p1 ∈ P 1 =
   ⨿                                                                               ⨿
        Pw1 1 , x2p2 the size of the flow of users on a path p2 ∈ P 2 =                 Pw2 2 ,
w1 ∈OD 1                                                                         w2 ∈OD 2
etc. Let us notice that
                                       ∑
   xkpk ≥ 0,     pkwk ∈ Pwk k ,                 xkpk = dkwk ,   wk ∈ ODk ,     k = 1, ..., m
     wk                                           wk
                                  pk k ∈P k k
                                   w       w


and that
               (    )       (      )
           wk+1 = ek ∈ ODk+1 = Ē k ,               dk+1
                                                     wk+1
                                                          = fekk ,   k = 1, ..., m − 1.

   For all k = 1, ..., m, we introduce for the graph Γ k and the set of paths P k
a matrix                                              {
                                                        1, ek ∈ pk
              Θk = δek pk ek ∈E k ,pk ∈P k , δek pk =              .
                                                        0, ek ∈
                                                              / pk
Then, for all k = 1, ..., m, the vector f k of flows on the edges of thegraph k
                                                                      { k } Γ is
                                                                 k
defined in a unique way by the vector of flows on the paths x = xpk pk ∈P k :

                                         f k = Θ k xk .
We introduce the following notation
                { }m              { }m                            { }m
            x = xk k=1 , f = f k k=1 ,                    Θ = diag Θk k=1 .
   Let us now describe the probabilistic model for the choice of the path by
a network user. We assume that each user l of a traﬃc network who uses a
586       Pavel Dvurechensky et al.

correspondence wk ∈ ODk at a level k (and simultaniously the edge ek−1 (=
wk ) ∈ Ē k−1 at the level k − 1) chooses to use a path pk ∈ Pwk k if

                                  pk = arg max {−gqkk (t) + ξqk,l
                                                               k },
                                              q k ∈P k k
                                                    w


where ξqk,l
         k are iid random variables with double exponential distribution (also

known as Gumbel’s distribution) with cumulative distribution function
                                                        −ζ/γ −E         k
                                  P (ξqk,l
                                        k < ζ) = exp{−e         },
where E ≈ 0.5772 is Euler–Mascheroni constant. In this case
                                 M [ξqk,l
                                       k ] = 0,     D[ξqk,l      k 2 2
                                                         k ] = (γ ) π /6.


Also, it turns out that, when the number of agents on each correspondence
wk ∈ ODk , k = 1, ..., m tends to infinity, i. e. M → ∞, the limiting distribution
of users among paths is the Gibbs’s distribution (also known as logit distribution)
                        exp(−gpkk (t)/γ k )
      xkpk = dkwk      ∑                        , pk ∈ Pwk k , wk ∈ ODk , k = 1, ..., m.   (1)
                           exp(−gp̃kk (t)/γ k )
                    p̃k ∈P k k
                          w


It is worth noting here that (see Theorem 1 below)
                                           [                         ]
             γ k ψw
                  k       k
                    k (t/γ ) = M{ξ k }
                                       k k
                                             max {−gp
                                                     k
                                                      k (t) + ξ k
                                                                p k }  .
                                    k         p   p ∈P      pk ∈P k k
                                                      wk          w


      For the sake of convenience we introduce the graph
                               ⨿m        ⟨       ⨿m       ⟩
                                    k                   k
                          Γ =     Γ = V, E =         Ẽ
                                        k=1                       k=1

and denote te = τe (fe ), e ∈ E.
    Assume that, for a given vector of expenses t on edges E, which is identical
to all users, each user chooses the shortest path at each level based on noisy
information and averaging of the information from the higher levels. Then, in
the limit number of users tending to infinity, such behavior of users leads to
the description of distribution of users on paths/edges given in (1) and the
equilibrium configuration in the system is characterized by the vector t for which
the vector x, obtained from (1), leads to the vector f = Θx satisfying t =
{τe (fe )}e∈E .
Theorem 1 (Variational principle). The described above fixed point equilib-
rium t can be found as a solution of the following problem (here and below we
denote by dom σ ∗ the eﬀective domain of the function conjugated to a function
σ)
                                              {                ∑          }
   min{Ψ (x, f ) : f = Θx, x ∈ X} = − min ∗ γ 1 ψ 1 (t/γ 1 ) +   σe∗ (te ) , (2)
   f,x                                                  t∈dom σ
                                                                            e∈E
                             Searching Equilibrium in Hierarchical Population Games                                     587

where                  ∑                                                       ∑            ∑
Ψ (x, f ) := Ψ 1 (x) =   σe11 (fe11 ) + Ψ 2 (x) + γ 1                                                  x1p1 ln(x1p1 /d1w1 ),
                               e1 ∈Ẽ 1                                      w1 ∈OD 1 p1 ∈P 1 1
                                                                                           w

              ∑                                               ∑            ∑
 Ψ 2 (x) =              σe22 (fe22 ) + Ψ 3 (x) + γ 2                             x2p2 ln(x2p2 /d2w2 ), d2w2 = fw1 2 ,
             e2 ∈Ẽ 2                                        w2 ∈Ē 1 p2 ∈Pw
                                                                           2
                                                                             2
                                                             ...
                   ∑                                                       ∑          ∑
    Ψ k (x) =                σekk (fekk ) + Ψ k+1 (x) + γ k                                   xkpk ln(xkpk /dkwk ),
                  ek ∈Ẽ k                                           wk ∈Ē k−1 pk ∈P k k
                                                                                          w

                                                   dk+1
                                                    wk+1
                                                         = fwk k+1 ,
                                                             ...
                         ∑                                         ∑           ∑
      Ψ m (x) =                   σemm (femm ) + γ m                                     xm      m    m
                                                                                          pm ln(xpm /dwm ),
                     em ∈E m                                 wm ∈Ē m−1 pm ∈Pw
                                                                             m
                                                                               m


                                                     dm      m−1
                                                      w m = fw m ,

                                                           {       ∫fe      }
                                  σe∗ (te ) = max           fe te − τe (z)dz ,
                                               fe
                                                                       0
                            {      ∫fe      }
     dσe∗ (te )       d
                   =    max fe te − τe (z)dz = fe :                                   te = τe (fe ),         e ∈ E,
        dte          dte fe
                                                     0
                                            ∑                               ∑
                         gpmm (t) =                  δ em p m t em =                 δ em p m t em ,
                                          em ∈Ẽ m                         em ∈E m
                   ∑                            ∑
    gpkk (t) =               δ ek p k t ek −              δek pk γ k+1 ψek+1
                                                                          k  (t/γ k+1 ), k = 1, ..., m − 1,
                  ek ∈Ẽ k                     ek ∈Ē k
                                          ( ∑                           )
                     k
                    ψw k (t) = ln                         exp(−gpkk (t)) ,           k = 1, ..., m,
                                           pk ∈P k k
                                                   w

                                                           ∑
                                          ψ 1 (t) =                 d1w1 ψw
                                                                          1
                                                                            1 (t).

                                                         w1 ∈OD 1



2   General Numerical Method

In this subsection, we describe one of our contributions made by this paper,
namely a general accelerated primal-dual gradient method for composite mini-
mization problems.
   We consider the following convex composite optimization problem [3]:

                                          min [ϕ(x) := f (x) + Ψ (x)].                                                   (3)
                                          x∈Q
588      Pavel Dvurechensky et al.

Here Q ⊆ E is a closed convex set, the function f is diﬀerentiable and convex
on Q, and function Ψ is closed and convex on Q (not necessarily diﬀerentiable).
   In what follows we assume that f is Lf -smooth on Q:
                 ∥∇f (x) − ∇f (y)∥∗ ≤ Lf ∥x − y∥ ,         ∀x, y ∈ Q.          (4)
We stress that the constant Lf > 0 arises only in theoretical analysis and not in
the actual implementation of the proposed method. Moreover, we assume that
the set Q is unbounded and that Lf can be unbounded on the set Q.
    The space E is endowed with a norm ∥·∥ (which can be arbitrary). The
corresponding dual norm is ∥g∥∗ := maxx∈E {⟨g, x⟩ : ∥x∥ ≤ 1}, g ∈ E ∗ . For
mirror descent, we need to introduce the Bregman divergence. Let ω : Q → R
be a distance generating function, i.e. a 1-strongly convex function on Q in the
∥·∥-norm:
                                       1
            ω(y) ≥ ω(x) + ⟨ω ′ (w), y − x⟩ +
                                                 2
                                         ∥y − x∥ ,       ∀x, y ∈ Q.            (5)
                                       2
Then, the corresponding Bregman divergence is defined as
               Vx (y) := ω(y) − ω(x) − ⟨ω ′ (x), y − x⟩,     x, y ∈ Q.         (6)
   Finally, we generalize the Grad and Mirr operators from [2] to composite
functions:
                     {                                  }
                                       L         2
 GradL (x) := argmin ⟨∇f (x), y − x⟩ + ∥y − x∥ + Ψ (y) , x ∈ Q,
                 y∈Q                   2
                     {                           }
                                  1
  Mirrz (g) := argmin ⟨g, y − z⟩ + Vz (y) + Ψ (y) ,
      α
                                                            g ∈ E ∗ , z ∈ Q.
                 y∈Q              α
                                                                          (7)

2.1    Algorithm description
Below is the proposed scheme of the new method. The main diﬀerences between
this algorithm and the algorithm of [2] are as follows: 1) now the Grad and Mirr
operators contain the Ψ (y) term inside; 2) now the algorithm does not require
the actual Lipschitz constant Lf , instead it requires an arbitrary number L0 1
and automatically adapts the Lipschitz constant in iterations; 3) now we need
to use a diﬀerent formula for αk+1 to guarantee convergence (see next section).
    Note that Algorihtm 1 if well-defined in the sense that it is always guaranteed
that τk ∈ [0, 1] and, hence, xk+1 ∈ Q as a convex combination of points from Q.
Indeed, from the formula for αk+1 we have
                               (√                    )
                                       1         1
                   αk+1 Lk+1 ≥             +           Lk+1 = 1,                (8)
                                    4L2k+1    2Lk+1

therefore τk = αk+11Lk+1 ≤ 1.
1
    The number L0 can be always set to 1 with virtually no harm to the convergence
    rate of the method.
                      Searching Equilibrium in Hierarchical Population Games          589

Algorithm 1 Accelerated gradient method.
Require: x0 ∈ Q: initial point; T : number of iterations; L0 : initial estimate of Lf .
 y0 ← x0 , z0 ← x0 , α0 ← 0
 for k = 0, . . . , T − 1 do
    Lk+1 ← max{L0 , Lk /2}
    while True √     do
        αk+1 ← αk2 LLk+1   k            1
                             + 4L21 + 2Lk+1 , and τk ← αk+11Lk+1 .
                                  k+1
          xk+1 ← τk zk + (1 − τk )yk
          yk+1 ← GradLk+1 (xk+1 )
                                                                 L
          if f (yk+1 ) ≤ f (xk+1 ) + ⟨∇f (xk+1 ), yk+1 − xk+1 ⟩ + k+1
                                                                   2
                                                                      ∥yk+1 − xk+1 ∥2 then
  break
        Lk+1 ← 2Lk+1
     end while
                α
     zk+1 ← Mirrzkk+1 (∇f (xk+1 ))
  end for
      return yT



2.2   Convergence rate

First we prove the analogues of Lemma 4.2 and Lemma 4.3 from [2].

Lemma 1. For any u ∈ Q and τk = αk+11Lk+1 we have

  αk+1 ⟨∇f (xk+1 ), zk − u⟩ ≤ αk+1
                               2
                                   Lk+1 (ϕ(xk+1 ) − ϕ(yk+1 )) + (Vzk (u) − Vzk+1 (u))
            + αk+1 Ψ (u) − (αk+1
                             2                       2
                                 Lk+1 )Ψ (xk+1 ) + (αk+1 Lk+1 − αk+1 )Ψ (yk ). (9)

Proof. From the first order optimality condition for zk+1 = Mirrα
                                                                zk
                                                                  k+1
                                                                      (∇f (xk+1 ))
we get
    ⟨                                                 ⟩
                   1 ′              ′
      ∇f (xk+1 ) +    V (zk+1 ) + Ψ (zk+1 ), zk+1 − u ≤ 0,      ∀u ∈ Q.      (10)
                   α k zk

Therefore
   αk+1 ⟨∇f (xk+1 ), zk − u⟩
             = αk+1 ⟨∇f (xk+1 ), zk − zk+1 ⟩ + αk+1 ⟨∇f (xk+1 ), zk+1 − u⟩
             ≤ αk+1 ⟨∇f (xk+1 ), zk − zk+1 ⟩ + ⟨Vz′k (zk+1 ), u − zk+1 ⟩
                                                                                     (11)
             + αk+1 ⟨Ψ ′ (zk+1 ), u − zk+1 ⟩
             ≤ (αk+1 ⟨∇f (xk+1 ), zk − zk+1 ⟩ − αk+1 Ψ (zk+1 ))
             + ⟨Vz′k (zk+1 ), u − zk+1 ⟩ + αk+1 Ψ (u),

where the second inequality follows from the convexity of Ψ .
   Using the triangle equality of the Bregman divergence,

                      ⟨Vx′ (y), u − y⟩ = Vx (u) − Vy (u) − Vx (y),
590      Pavel Dvurechensky et al.

we get
           ⟨Vz′k (zk+1 ), u − zk+1 ⟩ = Vzk (u) − Vzk+1 (u) − Vzk (zk+1 )
                                                             1           2
                                                                                 (12)
                                     ≤ Vzk (u) − Vzk+1 (u) − ∥zk+1 − zk ∥ ,
                                                             2
                                                  2
where we have used Vzk (zk+1 ) ≥ 12 ∥zk+1 − zk ∥ in the last inequality.
   So we have
    αk+1 ⟨∇f (xk+1 ), zk − u⟩
       (                                                                )
                                        1             2
    ≤ αk+1 ⟨∇f (xk+1 ), zk − zk+1 ⟩ − ∥zk+1 − zk ∥ − αk+1 Ψ (zk+1 )              (13)
                                        2
         + (Vzk (u) − Vzk+1 (u)) + αk+1 Ψ (u)
    Define v := τk zk+1 + (1 − τk )yk ∈ Q. Then we have xk+1 − v = τk (zk − zk+1 )
and τk Ψ (zk+1 ) + (1 − τk )Ψ (yk ) ≥ Ψ (v) due to convexity of Ψ . Using this and the
formula for τk , we get
 (                                                              )
                                     1              2
   αk+1 ⟨∇f (xk+1 ), zk − zk+1 ⟩ − ∥zk+1 − zk ∥ − Ψ (zk+1 )
                                     2
      (                                                                  )
        αk+1                                1            2    αk+1
 ≤−           ⟨∇f (xk+1 ), v − xk+1 ⟩ + 2 ∥v − xk+1 ∥ +             Ψ (v)
          τk                              2τk                  τk
    αk+1 (1 − τk )
 +                 Ψ (yk )
          τk
                    (                                                      )
                                                 Lk+1             2
 ≤ −(αk+1 Lk+1 ) ⟨∇f (xk+1 ), v − xk+1 ⟩ +
        2
                                                      ∥v − xk+1 ∥ + Ψ (v)
                                                   2
    2
+ (αk+1 Lk+1 − αk+1 )Ψ (yk )
               (                                                            )
                                             Lk+1               2
≤ −(αk+1 Lk+1 ) ⟨∇f (xk+1 ), yk+1 − xk+1 ⟩ +
      2
                                                  ∥yk+1 − xk+1 ∥ + Ψ (yk+1 )
                                              2
    2
+ (αk+1 Lk+1 − αk+1 )Ψ (yk )
                                                                                 (14)
Here the last inequality follows from the definition of yk+1 .
   Note that by the termination condition for choosing Lk+1 we have
              ϕ(yk+1 ) = f (yk+1 ) + Ψ (yk+1 )
                       ≤ f (xk+1 ) + ⟨∇f (xk+1 ), yk+1 − xk+1 ⟩
                         Lk+1                 2
                       +       ∥yk+1 − xk+1 ∥ + Ψ (yk+1 )                        (15)
                            2
                       = ϕ(xk+1 ) + ⟨∇f (xk+1 ), yk+1 − xk+1 ⟩
                         Lk+1                 2
                       +       ∥yk+1 − xk+1 ∥ + Ψ (yk+1 ) − Ψ (xk+1 ).
                            2
After rearranging:
        (                                                           )
                                    Lk+1               2
     − ⟨∇f (xk+1 ), yk+1 − xk+1 ⟩ +      ∥yk+1 − xk+1 ∥ + Ψ (yk+1 )
                                     2                                           (16)
      ≤ ϕ(xk+1 ) − ϕ(yk+1 ) − Ψ (xk+1 ).
                      Searching Equilibrium in Hierarchical Population Games      591

Hence,
             (                                                         )
                                             1            2
              αk+1 ⟨∇f (xk+1 ), zk − zk+1 ⟩ − ∥zk+1 − zk ∥ − Ψ (zk+1 )
                                             2
                                                                                  (17)
             ≤ (αk+1
                 2
                     Lk+1 )(ϕ(xk+1 ) − ϕ(yk+1 )) − (αk+1
                                                     2
                                                         Lk+1 )Ψ (xk+1 )
                 2
             + (αk+1 Lk+1 − αk+1 )Ψ (yk ).

Finally, combining the previous estimates, we get

  αk+1 ⟨∇f (xk+1 ), zk − u⟩ ≤ (αk+1
                                2
                                    Lk+1 )(ϕ(xk+1 ) − ϕ(yk+1 ))
                                + (Vzk (u) − Vzk+1 (u)) − (αk+1
                                                            2
                                                                Lk+1 )Ψ (xk+1 ) (18)
                                    2
                                + (αk+1 Lk+1 − αk+1 )Ψ (yk ) + αk+1 Ψ (u).

                                                                                    ⊓
                                                                                    ⊔

Lemma 2. For any u ∈ Q and τk = αk+11Lk+1 we have

   2
 (αk+1 Lk+1 )ϕ(yk+1 ) − (αk+1
                           2
                              Lk+1 − αk+1 )ϕ(yk )
                                                                                (19)
  ≤ αk+1 (f (xk+1 ) + ⟨∇f (xk+1 ), u − xk+1 ⟩ + Ψ (u)) + (Vzk (u) − Vzk+1 (u)).

Proof. Using convexity of f and relation τk (xk+1 − zk ) = (1 − τk )(yk − xk+1 ),
we obtain
    αk+1 (Ψ (xk+1 ) − Ψ (u)) + αk+1 ⟨∇f (xk+1 ), xk+1 − u⟩
         = αk+1 (Ψ (xk+1 ) − Ψ (u)) + αk+1 ⟨∇f (xk+1 ), xk+1 − zk ⟩
         + αk+1 ⟨∇f (xk+1 ), zk − u⟩
                                        αk+1 (1 − τk )
         ≤ αk+1 (Ψ (xk+1 ) − Ψ (u)) +                  ⟨∇f (xk+1 ), yk − xk+1 ⟩
                                             τk
         + αk+1 ⟨∇f (xk+1 ), zk − u⟩                                              (20)

         ≤ αk+1 (Ψ (xk+1 ) − Ψ (u)) + (αk+1
                                        2
                                            Lk+1 − αk+1 )(f (yk ) − f (xk+1 ))
         + αk+1 ⟨∇f (xk+1 ), zk − u⟩
         ≤ αk+1 ϕ(xk+1 ) − αk+1 Ψ (u) + (αk+1
                                          2
                                              Lk+1 − αk+1 )f (yk )
         − (αk+1
             2
                 Lk+1 )f (xk+1 ) + αk+1 ⟨∇f (xk+1 ), zk − u⟩.

Now we apply Lemma 1 to bound the last term, group the terms and get

                αk+1 (Ψ (xk+1 ) − Ψ (u)) + αk+1 ⟨∇f (xk+1 ), xk+1 − u⟩
                   ≤ αk+1 ϕ(xk+1 ) − (αk+1
                                       2
                                           Lk+1 )ϕ(yk+1 )                         (21)
                       2
                   + (αk+1 Lk+1 − αk+1 )ϕ(yk ) + (Vzk (u) − Vzk+1 (u)).

After rearranging, we obtain (19).                                                  ⊓
                                                                                    ⊔

   Now we are ready to prove the convergence theorem for Algorithm 1.
592        Pavel Dvurechensky et al.

Theorem 2. For the sequence {yk }k≥0 in Algorithm 1 we have
                      { T                                                      }
                       ∑
 (αT LT )ϕ(yT ) ≤ min
   2
                          αk (f (xk ) + ⟨∇f (xk ), u − xk ⟩ + Ψ (u)) + Vz0 (u)
                             x∈Q
                                     k=1
                                                                                              (22)
and, hence, the following rate of convergence:

                                                            4Lf R2
                                        ϕ(yT ) − ϕ(x∗ ) ≤          .                          (23)
                                                              T2
Proof. Note that the special choice of {αk }k≥0 in Algorithm 1 gives us
                                2
                               αk+1 Lk+1 − αk+1 = αk2 Lk ,             k ≥ 0.                 (24)

Therefore, taking the sum over k = 0, . . . , T − 1 in (19) and using that α0 = 0,
VzT (u) ≥ 0 we get, for any u ∈ Q,

                               ∑
                               T
      (αT2 LT )ϕ(yT ) ≤              αk (f (xk ) + ⟨∇f (xk ), u − xk ⟩ + Ψ (u)) + Vz0 (u)     (25)
                               k=1

and (22) is straightforward. At the same time, using the convexity of f (x), the
definition of ϕ(x), and u = x∗ = argminx∈Q ϕ(x), we obtain
                                   { T                                                            }
                                    ∑
  (αT2 LT )ϕ(yT ) ≤ min                    αk (f (xk ) + ⟨∇f (xk ), u − xk ⟩ + Ψ (u)) + Vz0 (u)
                    x∈Q
                                     k=1
          ( T         )
           ∑
      ≤          αk       ϕ(x∗ ) + Vz0 (x∗ ).
           k=1
                                                                                              (26)
                                      ∑T          2
From (24) it follows that               k=1 αk = αT LT , so

                                                          1
                                   ϕ(yT ) ≤ ϕ(x∗ ) +     2   Vz (x∗ ).                        (27)
                                                        αT LT 0

Now it remains to estimate the rate of growth of coeﬃcients Ak := αk2 Lk . For
this we use the technique from [3]. Note that from (24) we have
                                           √
                                             Ak+1
                             Ak+1 − Ak =                                  (28)
                                             Lk+1

Rearranging and using (a + b)2 ≤ 2a2 + 2b2 and Ak ≤ Ak+1 , we get
                                       (√      √ )2 (√      √ )2
      Ak+1 = Lk+1 (Ak+1 − Ak )2 = Lk+1   Ak+1 + Ak    Ak+1 − Ak
                        (√         √ )2
           ≤ 4Lk+1 Ak+1    Ak+1 − Ak
                                                                                              (29)
                           Searching Equilibrium in Hierarchical Population Games                     593

From this it follows that

                                        √            1∑ 1
                                                        k
                                            Ak+1 ≥        √ .                                       (30)
                                                     2 i=0 Li

Note that according to (4) and the stopping criterion for choosing Lk+1 in Al-
gorithm (1), we always have Li ≤ 2Lf . Hence,
                    √       k+1                                            (k + 1)2
                     Ak+1 ≥ √                    ⇐⇒           Ak+1 ≥                .               (31)
                           2 2Lf                                             8Lf
                                                                       2
    Thus, combining (31) and (27) with Vz0 (x∗ ) =: R2 , we have proved (23).                           ⊓
                                                                                                        ⊔

   Using the same arguments to [3], it is also possible to prove that the average
number of evaluations of the function f per iteration in Algorithm 1 equals 4.
Theorem 3. Let Nk be the total number of evaluations of the function f in
Algorithm 1 after the first k iterations. Then for any k ≥ 0 we have
                                                                  Lf
                                  Nk ≤ 4(k + 1) + 2 log2             .                              (32)
                                                                  L0

3       Application to the Equilibrium Problem
In this section, we apply Algorithm 1 to solve the dual problem in (2)
                              {                   ∑          }
                         min ∗ γ 1 ψ 1 (t/γ 1 ) +   σe∗ (te ) .
                              t∈dom σ
                                                            e∈E
                                                                                ∑
with t in the role of x, γ 1 ψ 1 (t/γ 1 ) in the role of f (x), and                  σe∗ (te ) in the role
                                                                               e∈E
of Ψ (x).
    The inequality (22) leads to the fact that Algorithm 1 is primal-dual [6–9],
which means that the sequences {ti } (which is in the role of {xk }) and {t̃i } (which
is in the role of {yk }) generated by this method have the following property:
                                                   ∑
                             γ 1 ψ 1 (t̃T /γ 1 ) +   σe∗ (t̃Te )
                                                        e∈E
                    {                                                            }
                         1 ∑[                                         ] ∑ ∗
                            T
    −     min                  αi (γ ψ (t /γ ) + ⟨∇ψ (t /γ ), t − t ⟩) +
                                    1 1 i   1       1 i   1        i
                                                                         σe (te )
        t∈dom σ ∗       AT i=0
                                                                                        e∈E
                                                                                                    (33)
                                                4L2 R22
                                              ≤         ,
                                                 T2
where                                                   ∑
                            L2 ≤ (1/ min        γk)              d1w1 · (lw1 )2 ,
                                    k=1,...,m
                                                      w1 ∈OD 1
594      Pavel Dvurechensky et al.

with lw1 being the total number of edges (among all of the levels) in the longest
path for correspondence w1 ,
                                                                  ∑( ( )         )2
 R22 = max{R̃22 , R̂22 }, R̃22 = (1/2) ∥t̄ − t∗ ∥2 , R̂22 = (1/2)   τe f¯eN − t∗e ,
                                                 2

                                                                            e∈E

f¯N is defined in Theorem 2, the method starts from t0 = t̄, t∗ is a solution of
the problem (2).
Theorem 4. Let the problem (2) be solved by Algorithm 1 generating sequences
{ti }, {t̃i }. Then. after T iterations one has
                 {                      ∑            }                   4L2 R22
              0 ≤ γ 1 ψ 1 (t̃T /γ 1 ) +   σe∗ (t̃Te ) + Ψ (x̄T , f¯T ) ≤         ,
                                                                          T2
                                          e∈E

where                                                   {     }k=1,...,m
               f i = Θxi = −∇ψ 1 (ti /γ 1 ),        xi = xk,i
                                                          pk pk ∈P k ,wk ∈OD k
                                                                               ,
                                                                       wk

                   exp(−gpkk (ti )/γ k )
 xk,i
  p k = d k
          w k     ∑                             ,   pk ∈ Pwk k ,   wk ∈ ODk ,     k = 1, ..., m,
                         exp(−gp̃kk (ti )/γ k )
              p̃k ∈P k k
                     w


                                1 ∑                           1 ∑
                                   T                             T
                         f¯T =        αi f i ,       x̄T =          αi xi .
                               AT i=0                        AT i=0

     Theorem 2 provides the bound for the number of iterations in order to solve
the problem (2) with given accuracy. Nevertheless, on each iteration it is neces-
sary to calculate ∇ψ 1 (t/γ 1 ) and also ψ 1 (t/γ 1 ). Similarly to [9–11] it is possible to
show, using the smoothed version of Bellman–Ford method, that for this purpose
it is enough to perform O(|O1 ||E| 1max 1 lw1 ) arithmetic operations.
                                             w ∈OD
   In general, it is worth noting that the approach of adding some artificial
vertices, edges, sources, sinks is very useful in diﬀerent applications [12–14].
   Acknowledgements. The research was supported by RFBR, research project
No. 15-31-70001 mol a mos and No. 15-31-20571 mol a ved.


References
1. Gasnikov, A., Gasnikova, E., Matsievsky, S., Usik, I.: Searching of equilibriums in
  hierarchical congestion population games. Trudy MIPT. 28, 129–142 (2015)
2. Allen-Zhu, Z., Orecchia, L. Linear coupling: An ultimate unification of gradient and
  mirror descent. ArXiV preprint:1407.1537v4 (2014)
3. Nesterov, Yu.: Gradient methods for minimizing composite functions. Math. Prog.
  140, 125–161 (2013)
4. Nesterov, Yu., Nemirovski, A.: On first order algorithms for l1 /nuclear norm mini-
  mization. Acta Numerica. 22, 509–575 (2013)
5. Nesterov, Yu.: Universal gradient methods for convex optimization problems. CORE
  Discussion Paper 2013/63 (2013)
                     Searching Equilibrium in Hierarchical Population Games         595

6. Nesterov, Yu.: Primal-dual subgradient methods for convex problems. Math. Pro-
  gram. Ser. B. 120, 261–283 (2009)
7. Nemirovski, A., Onn, S., Rothblum, U. G.: Accuracy certificates for computational
  problems with convex structure. Mathematics of Operation Research. 35, 52–78
  (2010)
8. Gasnikov, A., Dvurechensky, P., Kamzolov, D., Nesterov, Yu., Spokoiny, V., Stet-
  syuk, P., Suvorikova, A., Chernov, A.: Searching for equilibriums in multistage trans-
  port models. Trudy MIPT. 28, 143–155 (2015)
9. Gasnikov, A., Gasnikova, E., Dvurechensky, P., Ershov, E., Lagunovskaya, A.:
  Search for the stochastic equilibria in the transport models of equilibrium flow dis-
  tribution. Trudy MIPT. 28, 114–128 (2015)
10. Ed. Gasnikov, A.: Introduction to mathematical modelling of traﬃc flows. MC-
  CME, Moscow (2013)
11. Nesterov, Yu.: Characteristic functions of directed graphs and applications to
  stochastic equilibrium problems. Optim. Engineering. 8, 193–214 (2007)
12. Gasnikov, A.: About reduction of searching competetive equillibrium to the mini-
  max problem in application to diﬀerent network problems. Math. Mod. 27, 121–136
  (2015)
13. Babicheva, T., Gasnikov, A., Lagunovskaya, A., Mendel, M.: Two-stage model of
  equilibrium distributions of traﬃc flows. Trudy MIPT. 27, 31–41 (2015)
14. Vaschenko, V., Gasnikov, A., Molchanov, E., Pospelova, L., Shananin, A.: Analysis
  of tariﬀ policy of a railway cargo transportation. Preprint of CCAS RAS (2014)