<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Greedy Algorithm for Sparse Monotone Regression?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>y R. F</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>r A. Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei V. Mironov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikhail A. Levshunov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Saratov State University</institution>
          ,
          <addr-line>Saratov</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The problem of constructing the best fitted monotone regression is NP-hard problem and can be formulated in the form of a convex programming problem with linear constraints. The paper proposes a simple greedy algorithm for finding a sparse monotone regression using Frank-Wolfe-type approach. A software package for this problem is developed and implemented in R and C++. The proposed method is compared with the well-known pool-adjacent-violators algorithm (PAVA) using simulated data.</p>
      </abstract>
      <kwd-group>
        <kwd>greedy algorithms</kwd>
        <kwd>pool-adjacent-violators algorithm</kwd>
        <kwd>isotonic regression</kwd>
        <kwd>monotone regression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The recent years have seen an increasing interest in shape-constrained estimation
in statistics [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. One of such problems is the problem of constructing monotone
regression. The problem is to find best fitted non-decreasing points to a given set
of points on the plane. The survey of results on monotone regression can be found
in the book by Robertson and Dykstra [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The papers of Barlow and Brunk [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
Dykstra [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Best and Chakravarti [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Best [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] consider the problem of finding
monotone regression in quadratic and convex programming frameworks.
      </p>
      <p>
        Using mathematical programming approach the works [
        <xref ref-type="bibr" rid="ref1 ref21 ref31">1,21,31</xref>
        ] have recently
provided some new results on the topic. The papers [
        <xref ref-type="bibr" rid="ref17 ref7">7,17</xref>
        ] extend the problem to
particular orders defined by the variables of a multiple regression. The paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
investigates a dual active-set algorithm for regularized monotonic regression.
      </p>
      <p>
        Monotone regression is widely used in mathematical statistics [
        <xref ref-type="bibr" rid="ref10 ref2">2, 10</xref>
        ]; in
smoothing of empirical data [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]; in shape-preserving approximation [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ],
[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; in shape-preserving dynamic programming [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Constructing monotone regression we assume a relationship between a
predictor x = (x1; : : : ; xn) and a response y = (y1; : : : ; yn). In the general case it is
expected that xi+1 xi 6= const, xi &lt; xi+1, i = 1; : : : ; n 1.
? The work was supported by RFBR (grant 16-01-00507).</p>
      <p>A sequence z = (z1; : : : ; zn) 2 Rn is called monotone if
zi
zi 1</p>
      <p>0; i = 2; : : : ; n:
Denote 1n the set of all vectors from Rn, which are monotone.</p>
      <p>The problem of constructing monotone regression can be formulated in the
form of a convex programming problem with linear constraints as follows: it is
necessary to find a vector z 2 Rn with the lowest mean square error of
approximation to the given vector y 2 Rn under condition of monotonicity of z:
n
f (z) = 1 X(zi
n
i=1
yi)2</p>
      <p>min ;
! z2 1n
(1)</p>
      <p>
        In many situations researchers have no information regarding the
mathematical specification of the true regression function. Typically, this involves
non-decreasing of yi’s with the ordered xi’s. Such a situation is called isotonic
regression. Isotonic regression (monotone regression) is a special case to the
kmonotone regression [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>
        It is well-known that the problem (1) is NP-hard problem [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. In this paper
we present a simple greedy algorithm which employs Frank–Wolfe-type approach
for finding sparse monotone regression. A software package for this problem is
developed and implemented in R and C++.
      </p>
      <p>For the convenience of solving the problem (1), we move from points zi to its
increments i, where i = zi+1 zi, i = 1; : : : ; n 1, 0 = z1. Then
monotonicity of z corresponds non-negativity of i’s (exept 0). The proposed method is
compared with the well-known pool-adjacent-violators algorithm (PAVA) using
simulated data.
2
2.1</p>
      <sec id="sec-1-1">
        <title>PAVA</title>
        <p>
          Algorithms for monotone regression
Simple iterative algorithm for solving the problem (1) is called
Pool-AdjacentViolators Algorithm (PAVA) [
          <xref ref-type="bibr" rid="ref11 ref24">11,24</xref>
          ]. The work [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] examined the generalization of
this algorithm. The paper [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] studied this problem as the problem of identifying
the active set and proposed a direct algorithm of the same complexity as the
PAVA (the dual algorithm).
        </p>
        <p>
          PAVA computes a non-decreasing sequence of values z = (zi)in=1 such that
the problem (1) is optimized. In the simple monotone regression case we have
the measurement pairs (xi; yi). Let us assume that these pairs are ordered with
respect to the predictors. The following (Algorithm 1) is a pseudocode of PAVA
for the problem. The generalized pool-adjacent-violators algorithm (GPAVA),
which is a strict generalization of PAVA, was developed in the article [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ].
        </p>
        <p>
          The block values are expanded with respect to the observations i = 1; : : : ; n
such that the final result is the vector z of length n with elements zi of increasing
e
order [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>Algorithm 1: Pool-Adjacent-Violators Algorithm (PAVA)
begin</p>
        <p>Let zj(0) := yi be the start point, l = 0;
The index for the blocks is r = 1; : : : ; B where at step l = 0 we set B := n,
i.e. each observation zr(0) forms a block;
repeat
(Adjacent pooling ) Merge values of z(l) into blocks if zr+1 &lt; zr(l);
(l)
Solve f (z) for each block r, i.e., compute the update based on the
solver which gives zr(l+1) := s(zr(l)), the solver s is conditional
(weighted) mean and (weighted) quantities;
If zr(l+)1 zr(l) then l := l + 1;</p>
        <p>(l)
until the z-blocks are increasing, i.e. zr+1</p>
        <p>Return z;
zr(l) for all r;
end
2.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Frank–Wolfe type greedy algorithm</title>
        <p>
          Frank–Wolfe method (or conditional gradient method) solves conditional convex
optimization problems in vector finite-dimensional space. The method was
introduced in 1956. The original algorithm did not use a fixed step size, and has the
complexity of the linear programming. Frank–Wolfe method was developed by
Levitin and Polyak in 1966, and V.F. Demianov and A.M. Rubinov generalized
it to the case of arbitrary Banach spaces in 1970 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Recently Frank–Wolfe
type methods have caused an increased interest related to the possibility of
obtaining sparse solutions, as well as a good scaling [
          <xref ref-type="bibr" rid="ref12 ref23">12, 23</xref>
          ]. In particular, [
          <xref ref-type="bibr" rid="ref22 ref34">22, 34</xref>
          ]
researched algorithms for solving problems with penalty functions (instead of
considering the conditional optimization problems). Besides, the paper [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] uses
interlacing boosting with fixed-rank local optimization.
        </p>
        <p>As it was mentioned above, for computational convenience of the problem
(1), we moved from points zi to increments i = zi+1 zi, i = 1; : : : ; n 1,
0 = z1. Then the problem (1) can be rewritten as follows:</p>
        <p>n
g( ) := 1 X
n
i=1
i 1
X
j=0
j
yi
!2
! min;
2S
(2)
where S denotes the set of all = ( 0; 1; : : : ; n 1) 2 Rn such that 0 2 R,
( 1; : : : ; n 1) 2 Rn+ 1 and Pjn=01 j maxi yi.</p>
        <p>Let rg( ) denote the gradient of function g at point .</p>
        <p>It should be noted that for larger-scale problems the solution can appear
computationally quite challenging. In this regard, the present study proposes to
use a greedy algorithm of Frank–Wolfe type for solving this problem.</p>
        <p>The following (Algorithm 2) is a pseudocode of Frank–Wolfe-type algorithm
for the problem (2).</p>
        <p>The rate of convergence is estimated according to the following theorem.
Algorithm 2: Greedy algorithm for sparse monotone regression
begin</p>
        <p>Let N be the number of iteration;
Our function g( ) and the feasible set S were defined above.</p>
        <p>@g @g @g</p>
        <p>be the gradient of function g at
Let rg( ) =
i=k j=0
Let zero vector 0 = (0; : : : ; 0) be the start point, and let the counter t = 0;
while t &lt; N do</p>
        <p>Calculate rg( t), the gradient of the function g at the point t;
Let et be the solution of the linear optimization problem
hrg( t)T ; i ! min; where hrg( t)T ; i is a scalar product of vectors;
2S
(Update step) Let t+1 = t + t(et</p>
        <p>t), t = t+22 and than t := t + 1;
Recover the monotone sequence z = (z1; : : : ; zn) from the vector of
increments N ;
end
Theorem 1. Let f tg be generated according to the Frank–Wolfe method
(Algorithm 2) using the step-size rule t = t+22 : Then for all t 2
g( t)
g
4
r n(n + 1)(2n + 1) (maxi yi
6n2 t + 2
mini yi)2
;
(3)
where g is the optimal solution of (2).</p>
        <p>
          Proof. It it is know [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] that for all t
2:
g( t)
g
2L(Diam(S))2
t + 2
;
where L is the Lipschitz constant and and Diam(S) is the diameter of S.
        </p>
        <p>Let
r2g( ) :=
:
It is well-known that if rg is differentiable then its Lipschitz constant L satisfies
the inequality</p>
        <p>L</p>
        <p>sup kr2g( )k2:</p>
        <p>L</p>
        <p>vun 1
sup tuX
k=0
It is easy to prove and Diam(S) := p2(maxi yi
mini yi).</p>
        <p>
          The disadvantage of this method is the dependence of the theoretical degree
of convergence on the dimensionality of the problem. The papers [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]
suggest to use the values of duality gap as the stopping criterion for Frank-Wolfe
type algorithms.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Empirical Result</title>
      <p>The algorithms have been implemented both in R and C++. We compared the
performance of the greedy algorithm (Algorithm 2) with the performance of
PAVA (Algorithm 1) using simulated data sets.</p>
      <p>It should also be noted that the PAVA’s speed is significantly higher for
small-scale tasks in R. But if the number of points is greater than at least 2000,
the greedy algorithm spends less time searching for a solution (Fig. 1).</p>
      <p>Tables 1, 2 present empirical results for PAVA and greedy algorithms for
a simulated set of points. The simulated points are obtained as the values of
logarithm function with added normally distributed noise: A = f(xi; yi); yi =
ln(x0 + i4x) + 'i; 'i N (0; 1); x0 = 1; 4x = 1; i = 1; : : : ; 10000: The
dimension of the problem is 10000 points. The tables contain information on errors
1 Pn
n i=1(zi yi)2, elapsed time, cardinality and greedy algorithm’s iteration
number.</p>
      <p>The results show that error of greedy algorithm are getting closer to the
error of PAVA with increase of number of iterations for greedy algorithm. While
PAVA is better than greedy algorithm in terms of errors, the solutions of greedy
algorithm have a better sparsity. Greedy algorithm’s output solution is more
sparse. It should be noted that the elapsed time for PAVA implemented in C++
is smaller than for greedy algorithm. However, greedy algorithm has a better
rate of convergence if number of iterations is less than 700 for the algorithms
implemented in R,. Both algorithms obtain a sparse solutions, but we can control
the number of nonzero elements (cardinality) in the greedy algorithm as opposed
to PAVA. Generally, the greedy algorithm’s cardinality increases by one at each
iteration. Consequently, we should limit the number of iterations to obtain more
sparse solution.</p>
      <p>Figure 2 shows simulated points (N = 100) with logarithm structure and
isotonic regressions, where green line represents the greedy algorithm’s isotonic
e
m
i
T
1:5</p>
      <p>1
0:5
0
1000
2000</p>
      <p>3000
Points number
4000
regression and red line presents PAVA’s isotonic regression. Greedy algorithm
gives a solution with 14 jumps, and PAVA with 16 jumps. Since the solutions of
the greedy algorithm are more sparse, the greedy algorithm error (") is slightly
higher than the PAVA.</p>
      <p>The obtained empirical results for the greedy algorithm show that the degree
of convergence for the considered examples is much higher than its theoretical
estimates obtained in Theorem 1.
Our research proposes an algorithm for solving the problem of constructing the
best fitted monotone regression by using the Frank–Wolfe method. The
software was implemented in R and C++. We compared the performance of the
greedy algorithm with the performance of PAVA using simulated data sets. While
PAVA gives a slightly smaller errors than greedy algorithm, greedy algorithm
obtains significantly sparser solutions. The advantages of greedy algorithm are the
simplicity of implementation, the potential for controlling cardinality and the
elapsed time is lower for the implementation in R in the case of problem with
large dimension.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahuja</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orlin</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A fast scaling algorithm for minimizing separable convex functions subject to chain constraints</article-title>
          .
          <source>Operations Research</source>
          <volume>49</volume>
          (
          <issue>1</issue>
          ),
          <fpage>784</fpage>
          -
          <lpage>789</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Efficient algorithms for non-convex isotonic regression through submodular optimization</article-title>
          .
          <source>Tech. Rep. hal-01569934 (Jul</source>
          <year>2017</year>
          ), https://hal.archivesouvertes.fr/hal-01569934, working paper or preprint
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Barlow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brunk</surname>
          </string-name>
          , H.:
          <article-title>The isotonic regression problem and its dual</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>67</volume>
          (
          <issue>1</issue>
          ),
          <fpage>140</fpage>
          -
          <lpage>147</lpage>
          (
          <year>1972</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Best</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakravarti</surname>
          </string-name>
          , N.:
          <article-title>Active set algorithms for isotonic regression: a unifying framework</article-title>
          .
          <source>Mathematical Programming: Series A and B</source>
          <volume>47</volume>
          (
          <issue>3</issue>
          ),
          <fpage>425</fpage>
          -
          <lpage>439</lpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Best</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakravarti</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ubhaya</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Minimizing separable convex functions subject to simple chain constraints</article-title>
          .
          <source>SIAM Journal on Optimization</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <fpage>658</fpage>
          -
          <lpage>672</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Boytsov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Linear approximation method preserving kmonotonicity</article-title>
          .
          <source>Siberian electronic mathematical reports</source>
          <volume>12</volume>
          ,
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Burdakov</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimvall</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussian</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A generalised PAV algorithm for monotonic regression in several variables</article-title>
          . In J Antoch (ed.),
          <source>COMPSTAT, Proceedings of the 16th Symposium in Computational Statistics</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <fpage>761</fpage>
          -
          <lpage>767</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Burdakov</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sysoev</surname>
            ,
            <given-names>O.:</given-names>
          </string-name>
          <article-title>A dual active-set algorithm for regularized monotonic regression</article-title>
          .
          <source>Journal of Optimization Theory and Applications</source>
          <volume>172</volume>
          (
          <issue>3</issue>
          ),
          <fpage>929</fpage>
          -
          <lpage>949</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Judd</surname>
            ,
            <given-names>K.L.</given-names>
          </string-name>
          :
          <article-title>Shape-preserving dynamic programming</article-title>
          .
          <source>Math. Meth. Oper. Res</source>
          .
          <volume>77</volume>
          ,
          <fpage>407</fpage>
          -
          <lpage>421</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Aspects of Shape-constrained Estimation in Statistics</article-title>
          .
          <source>Ph.D. thesis</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Chepoi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cogneau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fichet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Polynomial algorithms for isotonic regression</article-title>
          .
          <source>Lecture Notes-Monograph Series</source>
          <volume>31</volume>
          (
          <issue>1</issue>
          ),
          <fpage>147</fpage>
          -
          <lpage>160</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Clarkson</surname>
            ,
            <given-names>K.L.</given-names>
          </string-name>
          :
          <article-title>Sparse greedy approximation, and the Frank-Wolfe algorithm</article-title>
          .
          <source>ACM Transactions on Algorithms</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Cullinan</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Piecewise convex-concave approximation in the minimax norm</article-title>
          . In: Demetriou,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Pardalos</surname>
          </string-name>
          , P. (eds.) Abstracts of Conference on Approximation and Optimization: Algorithms, Complexity, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          , June 29Ц30,
          <year>2017</year>
          , Athens, Greece. p.
          <fpage>4</fpage>
          . National and Kapodistrian University of Athens (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Demyanov</surname>
            ,
            <given-names>V.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubinov</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          :
          <article-title>Approximate Methods in Optimization Problems. (Modern Analytic and Computational Methods in Science</article-title>
          and Mathematics). American Elsevier Publishing Company. New York (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Diggle Peter</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tony</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          :
          <article-title>Case-control isotonic regression for investigation of elevation in risk around a point source</article-title>
          .
          <source>Statistics in medicine 18(1)</source>
          ,
          <fpage>1605</fpage>
          -
          <lpage>1613</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Dykstra</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>An isotonic regression algorithm</article-title>
          .
          <source>Journal of Statistical Planning and Inference</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>355</fpage>
          -
          <lpage>363</lpage>
          (
          <year>1981</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Dykstra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>An algorithm for isotonic regression for two or more independent variables</article-title>
          .
          <source>The Annals of Statistics</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <fpage>708</fpage>
          -
          <lpage>719</lpage>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfe</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>An algorithm for quadratic programming</article-title>
          .
          <source>Naval Research Logistics Quarterly</source>
          <volume>3</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>95</fpage>
          -
          <lpage>110</lpage>
          (
          <year>1956</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          :
          <article-title>Shape-Preserving Approximation by Real and Complex Polynomials</article-title>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Gudkov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mironov</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faizliev</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>On the convergence of a greedy algorithm for the solution of the problem for the construction of monotone regression</article-title>
          .
          <source>Izv. Saratov Univ. (N. S.)</source>
          ,
          <source>Ser. Math. Mech. Inform</source>
          .
          <volume>17</volume>
          ,
          <fpage>431</fpage>
          -
          <lpage>440</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Hansohm</surname>
          </string-name>
          , J.:
          <article-title>Algorithms and error estimations for monotone regression on partially preordered sets</article-title>
          .
          <source>Journal of Multivariate Analysis</source>
          <volume>98</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1043</fpage>
          -
          <lpage>1050</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Harchaoui</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juditsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemirovski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Conditional gradient algorithms for norm-regularized smooth convex optimization</article-title>
          .
          <source>Mathematical Programming: Series A and B</source>
          <volume>152</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>75</fpage>
          -
          <lpage>112</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Jaggi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Sparse Convex Optimization Methods for Machine Learning</article-title>
          .
          <source>Ph.D. thesis</source>
          , ETH Zu¨rich (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Leeuw</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornik</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , M.:
          <article-title>Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods</article-title>
          .
          <source>Journal of Statistical Software</source>
          <volume>32</volume>
          (
          <issue>5</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dykstra</surname>
          </string-name>
          , R.:
          <source>Order Restricted Statistical Inference</source>
          . John Wiley &amp; Sons, New York (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Shevaldin</surname>
          </string-name>
          , V.T.:
          <article-title>Local approximation by splines</article-title>
          .
          <source>UrO RAN</source>
          ,
          <string-name>
            <surname>Ekaterinburg</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Linear k-monotonicity preserving algorithms and their approximation properties</article-title>
          .
          <source>LNCS 9582</source>
          ,
          <fpage>93</fpage>
          -
          <lpage>106</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mironov</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          :
          <article-title>Duality gap analysis of weak relaxed greedy algorithms</article-title>
          .
          <source>LNCS 10556</source>
          ,
          <fpage>251</fpage>
          -
          <lpage>262</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mironov</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pleshakov</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          :
          <article-title>Dual convergence estimates for a family of greedy algorithms in banach spaces</article-title>
          .
          <source>LNCS</source>
          <volume>10710</volume>
          (
          <year>2018</year>
          ), in press
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On the saturation effect for linear shape-preserving approximation in Sobolev spaces</article-title>
          .
          <source>Miskolc Mathematical Notes</source>
          <volume>16</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1191</fpage>
          -
          <lpage>1197</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Stromberg</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>An algorithm for isotonic regression with arbitrary convex distance function</article-title>
          .
          <source>Computational Statistics &amp; Data Analysis</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <fpage>205</fpage>
          -
          <lpage>219</lpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodroofe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mentz</surname>
          </string-name>
          , G.:
          <article-title>Isotonic regression: Another look at the changepoint problem</article-title>
          .
          <source>Biometrika</source>
          <volume>88</volume>
          (
          <issue>3</issue>
          ),
          <fpage>793</fpage>
          -
          <lpage>804</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          :
          <article-title>Exact algorithms for isotonic regression and related</article-title>
          .
          <source>Journal of Physics: Conference Series</source>
          <volume>699</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuurmans</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Accelerated training for matrix-norm regularization: A boosting approach</article-title>
          .
          <source>NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2906</fpage>
          -
          <lpage>2914</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>