<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparative analysis of stochastic optimization algorithms for image registration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S Voronov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I Voronov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R Kovalenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ulyanovsk State Technical University</institution>
          ,
          <addr-line>Severniy Venets 32, Ulyanovsk, Russia, 432027</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>123</fpage>
      <lpage>130</lpage>
      <abstract>
        <p>This work is devoted to comparative experimental analysis of different stochastic optimization algorithms for image registration in spatial domain: stochastic gradient descent, Momentum, Nesterov momentum, Adagrad, RMSprop, Adam. Correlation coefficient is considered as the objective function. Experiments are performed on synthetic data generated via wave model with different noise-to-signal ratio.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Digital image registration is a process by which the most accurate match is determined between two
images, which may have been taken at the same or different times, by the same or different sensors,
from the same or different viewpoints. The registration process determines the optimal transformation,
which will align the two images. This has applications in many fields as diverse as medical image
analysis, pattern matching and computer vision for robotics, as well as remotely sensed data
processing. In all of these domains, image registration can be used to find changes in images taken at
different times, or for object recognition and tracking.</p>
      <p>
        Spatial domain methods operate directly on pixels, and the problem of the estimation of registration
parameters α becomes the problem of searching for the extreme point of a multi-dimensional
objective function J(Z, α) . The objective function measures the similarity between two images
Z(1)  z (j1) and Z(2)  z (j2) , where j   are nodes of grid mesh  on which the images are
defined. There is a wide variety of similarity measures that can be used as objective functions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
decision of which objective function to choose is usually based on the specifics of images,
deformation properties and conditions. Recently, objective functions from the theory of information
are becoming more popular. Among these functions the most interesting is mutual information. It has
been found to be especially robust for multimodal image registration and registration of images with
great non-linear intensity distortion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, mutual information has some drawbacks. One of
them is relatively high computational complexity.
      </p>
      <p>
        The choice of optimization search technique depends on the type of problem under consideration.
Traditional nonlinear programming methods, such as the constrained conjugate gradient, or the
standard back propagation in neural network applications, are well suited to deterministic optimization
problems with exact knowledge of the gradient of the objective function. Optimization algorithms
have been developed for a stochastic setting where randomness is introduced either in the noisy
measurements of the objective function and its gradient, or in the computation of the gradient
approximation. Stochastic gradient ascend (descend) is one of the most powerful technique of this
class [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It is an iterative algorithm, where registration parameters can be found as follows [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
ˆt  ˆt1  Λt  t (J(Zt , t1 )) ,

where  – gradient estimation vector of the objective function J obtained using not each pixel in the
images but a sample Z t taken randomly on each iteration, Λ t – positive-definite gain (learning rate)
matrix: Λt  it , it  0 , i  1, m ; m – the number of registration parameters.
      </p>
      <p>
        The main disadvantage of this optimization algorithm is presence of a large number of local
extreme points of the objective function due to the use of small samples and relatively short working
range in terms of registration parameters to be estimated. To overcome these problems the number of
sample elements can be increased. However, this leads to significant increase in computational efforts.
Another significant problem is choosing the hyperparameter Λt as it largely affects not only the
convergence rate but also the estimation accuracy. Thus, the problem of optimization of stochastic
gradient algorithm for image registration is an important, especially for real-time processing systems.
To overcome the mentioned problems some modifications of the “classical” stochastic gradient
descent have been proposed. This paper is devoted to comparative experimental analysis of these
modifications: Momentum, Nesterov momentum, Adagrad, RMSprop, Adam. These algorithms are
very effective, especially in training artificial neural networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Stochastic optimization algorithms</title>
      <p>Let us consider the most popular modifications of stochastic gradient descent optimization algorithm
which can be used for solving image registration problem.</p>
      <sec id="sec-2-1">
        <title>2.1. Momentum</title>
        <p>
          The idea behind Momentum optimization is quite simple [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: if one imagine a bowling ball rolling
down a gentle slope on a smooth surface: it will start out slowly, but it will quickly pick up momentum
until it eventually reaches terminal velocity (if there is some friction or air resistance). In contrast,
regular gradient descent will simply take small regular steps down the slope, so it will take much more
time to reach the bottom. Recall that gradient descent simply updates the parameter estimates αˆ by
directly subtracting the gradient of the cost function with regards to the parameters J(Z, α) multiplied
by the learning rate   0 . It does not care about what the earlier gradients were. If the local gradient
is tiny, it goes very slowly. Momentum optimization cares a great deal about what previous gradients
were: at each iteration, it adds the local gradient to the momentum vector m multiplied by the
learning rate  , and it updates the weights by simply subtracting this momentum vector. In other
words, the gradient is used as an acceleration, not as a speed. To simulate some sort of friction
mechanism and prevent the momentum from growing too large, the algorithm introduces a new
hyperparameter h , simply called the momentum, which must be set between 0 (high friction) and 1
(no friction). A typical momentum value is 0.9. Thus, the equation for parameter estimate updates can
be written as follows:
        </p>
        <p>ˆt  ˆt1  mt ,
where mt  hmt1  Λt  t (J(Zt , t1) .</p>
        <p>One can easily verify that if the gradient remains constant, the terminal velocity (i.e. the maximum
size of the weight updates) is equal to that gradient multiplied by the learning rate  multiplied by
1 . For example, if h  0.9 , then the terminal velocity is equal to 10 times the gradient times the
1  h
learning rate, so Momentum optimization ends up going 10 times faster than “classical” stochastic
gradient descent. This allows Momentum optimization to escape from plateaus much faster.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Nesterov momentum</title>
        <p>
          In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] the author proposes one small variant to Momentum optimization which is almost always faster
than vanilla Momentum optimization. The idea behind Nesterov momentum optimization consists in
measuring the gradient of the cost function not at the local position but slightly ahead in the direction
of the momentum. Hence, the only difference from vanilla Momentum optimization is that the
gradient is measured on t -th iteration at the point ˆt1  hmt rather than at ˆt1 :
ˆt ˆt1  hmt1  Λt  t J(Zt ,ˆt1  hmt .
        </p>
        <p></p>
        <p>This small tweak works because in general the momentum vector will be pointing in the right
direction (i.e., toward the optimum), thus it will be slightly more accurate to use the gradient measured
a bit farther in that direction rather than using the gradient at the original position. Figure 1 shows this
effect. Here  1 represents the gradient of the cost function measured at the starting point ˆt1 , and
 2 represents the gradient at the point located at ˆt1  hmt . As one can see, the Nesterov update
ends up faster optimizers slightly closer to the optimum. After a while, these small improvements add
up and the procedure ends up being significantly faster than regular Momentum optimization.
Moreover, we should note that when the momentum pushes the weights across a valley,  1 continues
to push further across the valley, while  2 pushes back toward the bottom of the valley. This helps
reduce oscillations and thus converges faster.</p>
        <p>λβ1
m
λβ2
λβ2
λβ1</p>
        <p>J</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Adagrad</title>
        <p>If we consider the elongated bowl problem again: gradient descent starts by quickly going down the
steepest slope, then slowly goes down the bottom of the valley. However, it would be better if the
algorithm could detect this early on and correct its direction to point a bit more toward the global
optimum.</p>
        <p>
          The Adagrad algorithm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] achieves this by scaling down the gradient vector along the st eepest
dimensions:
        </p>
        <p>ˆt ˆt1  Λt  t JZt ,ˆt1 //ct  1 2 ,
where ct  ct1   t JZt ,ˆt1 2 .</p>
        <p>The first step of this algorithm on each iteration is accumulating the square of the gradients into the
vector ct . If the cost function is steep along the i -th dimension, then cit will get larger and larger at
each iteration. The second step is almost identical to “classical” stochastic gradient descent, but with
one big difference: the gradient vector is scaled down by a factor of ct  1 2 (the // symbol
represents the element-wise division, and  is a smoothing term to avoid division by zero, typically
set to 10–8). In short, this algorithm decays the learning rate, but it does so faster for steep dimensions
than for dimensions with gentler slopes. This is called an adaptive learning rate. It helps point the
resulting updates more directly toward the global optimum. One additional benefit is that it requires
much less tuning of the learning rate hyperparameter.</p>
        <p>Adagrad often performs well for simple quadratic problems, but unfortunately it often stops too
early when training neural networks. The learning rate gets scaled down so much that the algorithm
ends up stopping entirely before reaching the global optimum.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. RMSprop</title>
        <p>
          Although Adagrad slows down a bit too fast and ends up never converging to the global optimum, the
RMSProp algorithm [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] fixes this by accumulating only the gradients from the most recent iterations
(as opposed to all the gradients since the beginning of training). It does so by using exponential decay
in the first step:
        </p>
        <p>c r t  dc r t1  1 d  t JZt ,ˆt1 2
where d is the decay rate and it is typically set to 0.9.</p>
        <p>
          Except on very simple problems, this optimizer almost always performs much better than Adagrad.
It also generally performs better than Momentum optimization and Nesterov momentum. In fact, it
was the preferred optimization algorithm of many researchers until Adam optimization came around.
2.5. Adam
Adam [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] which stands for adaptive moment estimation, combines the ideas of Momentum
optimization and RMSProp: just like Momentum optimization it keeps track of an exponentially
decaying average of past gradients, and just like RMSProp it keeps track of an exponentially decaying
average of past squared gradients:
m At 
c At 
ˆt ˆt1  Λ mAt // c At  1 2 ,
 t
d1mAt1  1 d1  t JZt ,ˆt1  ,
        </p>
        <p>1 d1
d2 c At1  1 d2  t JZt ,ˆt1 2</p>
        <p>.</p>
        <p>1 d2</p>
        <p>One can notice the similarity of Adam update rule to both Momentum optimization and RMSProp.
The only difference is that it computes an exponentially decaying average rather than an exponentially
decaying sum for mAt and c At , but these are actually equivalent except for a constant factor as the
decaying average is just 1  d1  and 1 d2  times the decaying sum respectively. The momentum
decay hyperparameter d1 is typically initialized to 0.9, while the scaling decay hyperparameter d 2 is
often initialized to 0.999. As earlier, the smoothing term  ϵ is usually initialized to a tiny number
such as 10–8. In fact, since Adam is an adaptive learning rate algorithm like both Adagrad and
RMSProp, it requires less tuning of the learning rate hyperparameter Λ t .</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and analysis</title>
      <sec id="sec-3-1">
        <title>3.1. Synthetic data</title>
        <p>
          For efficiency analysis of differentit optimization algorithms it is reasonable to use simulated images
whose intensity probability distribution function and correlation function can be priori defined during
their synthesis. In conducted experiments simulated images based on wave model [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ] with
intensity probability distribution function and correlation function close to Gaussian and with different
correlation radius were used. In addition, an unbiased Gaussian noise was used in simulations. Figure
2 shows an example of such synthesized image.
        </p>
        <p>
          In order to measure the performance of the optimization algorithms we tested them on images with
different noise-to-signal ratio and with different μ ‒ the number of points in the sample using for
estimation of the gradient of the chosen objective function. Correlation coefficient [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is chosen as an
objective function to be optimized. Similarity model is considered as the deformation model to be
estimated. For all of the below results the deformation parameters are the following: horizontal shift ‒
20 pixels to the right, vertical shift ‒ 15 pixels upwards, clockwise rotation ‒ 17 degrees, scale factor –
0.9. In each experiment, optimal hyperparameters were chosen experimentally as different algorithms
better perform with different hyperparameters and their choice is out of the scope of this article.
        </p>
        <p>
          The number of iterations before convergence of mismatch Euclidean distance [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] E which is an
integral measure of registration parameters’ convergence is used as the performance criterion.
Moreover, all the results are averaged by 50 realizations to make them more consistent and
reproducible.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. Different sample size</title>
        <p>Figure 3 shows the convergence of mismatch Euclidean distance for the algorithms with different
sample size μ . Hereafter, curve 1 corresponds to “classical” stochastic gradient descent, 2 ‒ stochastic
gradient descent with Momentum, 3 – Nesterov momentum (plus markers), 4 – Adagrad (plus
markers), 5 – RMSprop (dashed line), 6 – Adam (dashed line).</p>
        <p>One can see that in both cases “classical” stochastic gradient descent shows the worst result as it
starts to converge only after 800 iterations for μ  5 and after 700 iterations for μ  20 . Momentum
optimization algorithm performs almost identically but slightly better for μ  20 . The best results in
both cases are provided by Adam and RMSprop optimizations as they start to converge after 300
iterations for μ  5 and after 200 iterations for μ  20 . Moreover, it is obvious that Adam algorithm
in both situations has less variance then RMSprop, thus we can conclude that it is more stable and
hence preferable. Adagrad and Nesterov momentum algorithms show close results in terms of number
of iterations before convergence (500 iterations for μ  5 and after 450 iterations for μ  20 ), but in
the beginning Adagrad has much faster convergence rate and with some optimization (e.g. increasing
or dropping learning rates after a number of iterations) it possibly can outperform Nesterov
momentum.</p>
        <p>Also, we can conclude that all of the algorithms have better convergence rate with bigger sample
size. It is reasonable from theoretical point of view because the objective function gradient estimates
become less noisy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.1.2. Images with different signal-to-noise ratio</title>
        <p>Let us test the algorithms in case of noisy images with different signal-to-noise ratio q . In this
experiment we were using the sample size μ  5 for each algorithm and q . Figure 4 shows the
convergence of algorithms for q  50 and q  2 .</p>
        <p>As in the previous experiment, “classical” stochastic gradient descent shows the slowest
convergence rate in both set-ups. Adam and RMSprop algorithms have the best result. For q  50
their curves are almost identical but when the noise increases Adam has much faster convergence rate
in the beginning, hence it can potentially show better result. In addition, we can notice that with very
intense noise ( q  2 ) Adagrad converges with bigger error in comparison with other algorithms and it
performs almost the same as “classical” stochastic gradient descent. In order to reduce the error we
can choose smaller learning rates. However, with smaller rate sometimes it was not able to converge at
all.</p>
        <p>Additionally, it is clear that noise affects not only the convergence rate but also the variance of
estimates for all of the algorithms as the curves become less smooth.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2. Real data</title>
        <p>Satellite images were used for the comparative analysis on real data. Figure 5 shows an example of the
images taken in different weather conditions.</p>
        <p>Figure 6 shows an example of mismatch Euclidean distance convergence of the algorithms for
images shown in figure 5. For this experiment μ was set to 25 and the results were averaged by 100
realizations as them became more noisy in comparison with synthesized images, especially for the
algorithms with adaptive learning rates.</p>
        <p>One can easily notice that the results on real data are almost identical to the results on synthesized
images. Again, Adam and RMSprop algorithms are the fastest in terms of convergence rate. However
here we can see the algorithms with adaptive learning rates have much noisier curves that the others. It
can be explained by the fact that when dealing with real images we have noisier gradient estimation,
thus in these algorithms the learning rate estimation becomes less stable.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The comparative analysis of different optimization algorithms for solving image registration problem
in spatial domain shows that in each case “classical” stochastic gradient descent shows the worst result
in terms of the convergence rate of registration parameters’ estimates. Momentum optimization
algorithm just slightly outperforms it. Adagrad and Nesterov momentum algorithms show close results
in terms of number of iterations before convergence but except for the situation with intense noise
Adagrad in the beginning has much faster convergence rate and with some optimization (e.g.
increasing or dropping learning rates after a number of iterations) it possibly can outperform Nesterov
momentum The best results are provided by Adam and RMSprop optimizations Moreover, it is
obvious that Adam algorithm is almost always preferable as it has less variance then RMSprop.</p>
      <p>Furthermore, we can conclude that all of the algorithms have better convergence rate with bigger
sample size. It is reasonable from theoretical point of view because the objective function gradient
estimates become less noisy. Additionally, it is clear that noise affects not only the convergence rate
but also the variance of estimates for all of the algorithms as the curves become less smooth.</p>
      <p>Experiments on real satellite images show mostly identical results.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Russian Foundation for Basic Research, projects no. 16-41-732084
and 16-31-00468.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Goshtasby</surname>
            <given-names>A A</given-names>
          </string-name>
          <year>2012</year>
          <article-title>Image registration</article-title>
          .
          <source>Principles, tools and methods</source>
          (Springer London Dordrecht Heidelberg New York) p
          <fpage>441</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Tashlinskii</surname>
            <given-names>A G</given-names>
          </string-name>
          and
          <string-name>
            <surname>Voronov S V 2013</surname>
          </string-name>
          <article-title>The 11th Int. conf</article-title>
          .
          <source>Pattern Recognition and Image Analysis: New Information Technologies</source>
          <volume>1</volume>
          <fpage>326</fpage>
          -
          <lpage>329</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Papamakarios</surname>
            <given-names>G 2014</given-names>
          </string-name>
          <article-title>Comparison of Modern Stochastic Optimization Algorithms</article-title>
          (University of Edinburgh) p
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Tashlinskii</surname>
            <given-names>A G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Safina G L amd Voronov</surname>
            <given-names>S V</given-names>
          </string-name>
          <year>2012</year>
          <article-title>Pseudogradient optimization of objective function in estimation of geometric interframe image deformations Pattern Recognition and Image Analysis 22 386</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Goodfellow</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y</given-names>
          </string-name>
          and
          <string-name>
            <surname>Courville A 2016 Deep</surname>
          </string-name>
          <article-title>Learning</article-title>
          (MIT Press) p
          <fpage>800</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Polyak</surname>
            <given-names>B T</given-names>
          </string-name>
          <year>1964</year>
          <article-title>Some methods of speeding up the convergence of iteration methods</article-title>
          <source>Computational Mathematics and Mathematical Physics</source>
          <volume>4</volume>
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Nesterov</surname>
            <given-names>Y A</given-names>
          </string-name>
          <year>1983</year>
          <article-title>A method of solving a convex programming problem with convergence rate O(1/k2)</article-title>
          <source>Soviet Mathematics Doklady</source>
          <volume>27</volume>
          372
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Duchi</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hazan</surname>
            <given-names>E</given-names>
          </string-name>
          and
          <string-name>
            <surname>Singer</surname>
            <given-names>Y 2011</given-names>
          </string-name>
          <article-title>Adaptive subgradient methods for online learning and stochastic optimization</article-title>
          <source>Journal of Machine Learning Research 12 2121</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9] University of Toronto csc321 course lecture
          <article-title>6 slides (Access mode: https://www</article-title>
          .cs.toronto.edu/ ~tijmen/csc321/slides/lecture_slides
          <source>_lec6.pdf)</source>
          (
          <volume>18</volume>
          .
          <fpage>11</fpage>
          .
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kingma</surname>
            <given-names>D</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ba J 2015 Adam:</surname>
          </string-name>
          <article-title>A method for stochastic optimization (Published as a conference paper at</article-title>
          ICLR)
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Krasheninnikov</surname>
            <given-names>V R</given-names>
          </string-name>
          <year>2003</year>
          <article-title>Foundations of Image Processing Theory (Ulyanovsk: UlSTU</article-title>
          ) p
          <fpage>150</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Smelkina</surname>
            <given-names>N A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kosarev</surname>
            <given-names>R N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikonorov</surname>
            <given-names>A V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bairikov</surname>
            <given-names>I M</given-names>
          </string-name>
          ,
          <article-title>Ryabov</article-title>
          K
          <string-name>
            <given-names>N</given-names>
            ,
            <surname>Avdeev</surname>
          </string-name>
          <string-name>
            <given-names>A V</given-names>
            and
            <surname>Kazanskiy N L 2017</surname>
          </string-name>
          <article-title>Reconstruction of anatomical structures using statistical shape modeling</article-title>
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>6</issue>
          )
          <fpage>897</fpage>
          -
          <lpage>904</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-6-
          <fpage>897</fpage>
          -904
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>