Numerical solution of the dynamic incentive
      problem in discrete time taking into account the
                   learning curve effect
                                                                    Oleg Pavlov
                                                     Institute of economics and management
                                                      Samara National Research University
                                                                   Samara, Russia
                                                                  pavlov@ssau.ru

    Abstract—The paper considers the dynamic incentive                       program. When making a decision, the center proceeds from
problem in discrete time, taking into account the learning                   the principle of maximum guaranteed result. As a result, the
effect. The task is formulated as a dynamic game between the                 initial problem is transformed into the optimal control
leader and the performers. To solve the problem, the principle               problem.
of cost recovery is applied, which reduces the original task to
the optimal control problem in discrete time. Numerical                          In this article basing on the approach [1,12], the dynamic
solutions of the problem for various models of learning curves               incentive problem of agents taking into account the learning
are obtained using the Bellman dynamic programming                           curve effect is formulated and numerically solved using the
method. Also, the study is conducted of the discount rate’s                  Bellman dynamic programming method.
impact on the solution of the incentive problem
                                                                            II. STATEMENT AND ALGORITHM FOR SOLVING THE DYNAMIC
   Keywords—dynamic incentive problem, inverse Stackelberg                                 INCENTIVE PROBLEM OF AGENTS
game, learning curve effect, Bellman dynamic programming
                                                                                 A two-level dynamic manufacturing system consisting of
                        I. INTRODUCTION                                      a center and n independent agents is considered. Agents
                                                                             produce parts from which the finished product is then
    The article discusses the game dynamic task of the                       assembled. Labor costs and financial incentives for agents
executors performing the production task in the context of                   depend only on their own actions. This article applies the
new product development. The development of new                              principle of game decomposition [1], which allows to
products at industrial enterprises is characterized by the                   consider the management of the i-th agent independently and
learning curve effect, which is that the time spent by                       not to take into account the interaction of agents with each
employees (laboriousness) on performing multiple repetitive                  other. The state of a dynamic production system depends on
production operations is reduced.                                            the actions of agents, and the center affects the managed
    The task of executors stimulation is one of the most                     system only through the payment of material remuneration to
important in the management theory. The management (the                      agents.
center) should choose such an incentive system based on the                     The dynamics of part production by the i-th agent is
forecast of the agent’s actions in order to ensure the                       described by a discrete equation:
fulfillment of their economic interests. The executor (the
agent) chooses an action (volume of work) based on his                                            x t  x t  1  u t , t  1 ,T ,          
economic interests.                                                          where xt is the cumulative production volume of the part in
                                                                             the time period t, t is the number of the time period, ut is the
    Dynamic problems of interaction of unequal players are
                                                                             production volume of the part in the period t, T is the
considered in the active systems theory [1], in the
                                                                             quantity of time periods considered.
information theory of hierarchical systems [2–4] and in the
dynamic games theory developed by international authors                         Before the start of mass production, we know the number
[5–11]. It should be noted that the stimulation problem in                   of manufactured parts, it is as follows:
different theories has received various names. In the active
systems theory it is the incentive task, in publications of                                                    x0  X 0 .                                   
foreign authors on game theory it is the inverse Stackelberg                   In the final time period, the cumulative volume of parts
game, in the information theory of hierarchical systems it is                must be equal to the specified as follows:
the Germeyer game.
                                                                                                      xT  X 0  R ,                                        
    The active systems theory [1] offers the approach called                     where R is the specified number of parts.
the principle of agent’s cost compensation. The center pays
material remuneration to the agent, compensating his costs,                      Restrictions are imposed on the production volume of the
in the case of choosing the optimal planned trajectory of the                part:
center and does not pay material compensation otherwise.                                    0  u t  X 0  R  x t  1 , t  1 ,T .                        
The initial problem is divided into two tasks: the choice of
                                                                                 The target function of the center is to maximize the
the incentive system and the solution of the optimal control
                                                                             discounted total difference between the income from the
problem. In [12], results are presented that generalize the
                                                                             manufactured parts and the costs of the agent’s material
theorems from the monograph [1].
                                                                             compensation:
    The hierarchical systems theory [2–4] suggests the
                                                                                                T
approach that uses the center’s choice of the program of joint                                         1
                                                                                         Jp                      [ p u t   ( x t )]  m a x ,    
actions with the agent and punishment for deviation from this                                  t 1 ( 1  r )
                                                                                                                t


Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science

where p is the part price,  ( x t ) is the center incentive                                                              ( x t )  C t ( u t , x t  1 ) 
                                                                                                                                                        
function, r is the center discount rate.                                                          We substitute the formula (4) into the target function of
                                                                                              the center, taking into account (3):
    The incentive function of the center is a rule in
accordance with which a material remuneration is assigned                                                             T
                                                                                                                                  1
to the agent for the amount of work performed. The center                                                  Jp  
                                                                                                                                       t
                                                                                                                                            [ p  sсt ] u t  m a x .                
                                                                                                                     t 1 ( 1  r )
manages the production process through the mechanism of
                                                                                                  Since the part price p is constant, the center can increase
material incentives  ( x t ) , economically encouraging agents
                                                                                              his profit only by minimizing the total cost of paying the
to fulfill the planned production volumes.                                                    agent’s material remuneration. The target function of the
    The discount rate helps to take into account the time                                     center will take the following form:
preferences of the center (agent) for the cost of cash flows.                                                                T
                                                                                                                                        1
The more distant in time the cash flow, the cheaper it is for                                                     Jp  
                                                                                                                                               t
                                                                                                                                                   sс t u t  m in .                 
the center (agent).                                                                                                         t 1 ( 1  r )

                                                                                                  Thus, the initial dynamic incentive task is reduced to the
    The target function of the agent is to maximize the
                                                                                              optimal control problem:
discounted total difference between material remuneration
and labor costs, expressed in monetary form:                                                                                 T
                                                                                                                                           1
                                                                                                                  Jp  
                                                                                                                                               t
                                                                                                                                                   sс t u t  m in .          
                T
                       1                                                                                                     t 1 ( 1  r )
        Ja                       [  ( x t )  C t ( u t , x t 1 ) ]  m a x ,    
               t 1 ( 1  r )
                                t                                                                                        x t  x t 1  u t ,        t  1 ,T , 
                                                                                                                                                            
where r is the agent discount rate, C t ( u t , x t  1 ) is the agent                                                               x0  X 0 ,           
labor costs.                                                                                                              xT  X 0  R ,                  
                                                                                                                0  u t  X 0  R  x t  1 , t  1 ,T .  
   Agent labor costs are determined by the following
equation:                                                                                         The center’s task is to select the optimal production
                                                                                              volumes of parts u to p t , taking into account restrictions (9),
                    C t ( u t , x t 1 )  s с t u t  
                                                                                              under which the production process (6) will switch from the
where s is the cost of one hour per agent, ct is the                                          initial state (7) to the final state (8) and the minimum of the
laboriousness of manufacturing the part.                                                      center’s target function (5) will be achieved.
   The dependence of the part laboriousness on the                                                The formulated optimal control problem (5)-(9) was
cumulative production volume is described by various                                          solved using the Bellman dynamic programming method
models of the learning curve given in [13]-[15].                                              [16], implemented in the pascal programming language.
     In accordance with his economic interests, the agent
                                                                                                  III. THE RESULTS OF THE NUMERICAL SOLUTION OF THE
selects parts production volumes that maximize his target
                                                                                                        DYNAMIC INCENTIVE PROBLEM OF AGENTS
function (2). The center’s task is to choose the optimal
incentive system in which the agent will produce such parts                                       The numerical solution of the optimal control problem is
production volumes that maximize the center target function                                   carried out on the example of the production of parts of the
(1).                                                                                          enterprise Salut JSC. According to the enterprise data,
                                                                                              regression models of the of laboriousness manufacturing
    To solve the formulated control problem, the principle of                                 parts are constructed: power, exponential and logistic.
cost compensation is applied [1, 12]. The solution algorithm
consists in dividing the initial problem into two tasks:                                           Power-based labour input model:
choosing a compensatory incentive system and solving the
optimal control problem with the objective function equal to                                                                    c t  4 2 ,6 4 x t  1 . 
                                                                                                                                                      0 ,3
                                                                                                                                                                                      
the difference between the center’s income and the agent’s                                         Exponential labor input model:
labor costs.
                                                                                                                      с t  9 ,1 7  6 ,1 6 e
                                                                                                                                                       0 ,0 3 x t  1
                                                                                                                                                                         .           
    1. The choice of a compensatory incentive system.                                              Logistic labor input model:
   The center selects a compensatory incentive system,
                                                                                                                                                    1                  
which consists in compensating the agent costs in the case of                                              с t  5 5 ,1 0  3 6 , 6 1 
                                                                                                                                                         0 ,0 5 x t  1 
                                                                                                                                                                          .          
choosing the optimal planned production volume of the                                                                                   1  0 , 0 1 7 e                
center x to p t and the absence of material payments otherwise:                         To solve the problem, the following data was used: the
                                                                                    number of time periods T=12 months, the production volume
                                                                                    of parts R=240 pcs., production experience before serial
                                                        opt
                 C t ( u t , x t  1 ), е с л и x t  x t , д л я  t  1 ,T ,
     ( xt )                                                                   production x 0  1 pcs. The discrete step of changing the
                                                        opt
                
                0,                     е с л и x t  x t , д л я  t  1 ,T .      parts production volume when implementing the dynamic
    2. The solution of the optimal control problem with the                         programming method is 1 pcs.
target function equal to the difference between the center
income and the agent labor costs.                                                       Numerical solutions of the optimal control problem for
                                                                                    power-based, exponential and logistic models of labor input
    To encourage the agent to choose the planned production                         are presented in Fig. 1-3. The figures show the optimal
volume, the center pays a material remuneration equal to the                        trajectories of cumulative production volumes for various
agent costs:                                                                        discount rates.


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                                         252
Data Science


                           250                                                Analyzing Fig. 3, we conclude that for the logistic
                                                                          laboriousness model in the absence of discounting (r=0%),
      Optimal cumulative
                           200                             r=0%           the optimal trajectory of cumulative production volume is the
         volume (pcs.)
          production
                           150                                            logistic curve. The optimal trajectory of the cumulative
                                                           r=10%          production volume consists of two sections: concave and
                           100                                            convex.
                                                           r=20%
                           50
                                                                              Fig. 4 shows the optimal trajectories of production
                            0                              r=30%          volumes for various discount rates r. The optimal strategy of
                                 0 2 4 6 8 10 12           r=40%          the center in the absence of discounting (r = 0%) is:
                                 Time period (months)                     reduction of production volumes for the concave section of
                                                                          the optimal trajectory of the cumulative production volume
                                                                          and increase in production volumes for the convex section of
Fig. 1. The dependence of the optimal cumulative production volume on     the trajectory. The minimum of production volume
the discount rate for the power-based laboriousness model.                corresponds to the inflection point of the optimal trajectory
                                                                          of the cumulative production volume.
    From an analysis of Fig. 1-2, it follows that for a power-
based and exponential model of labour input, a convex curve                                         100


                                                                               Optimal production
is the optimal trajectory of the cumulative production
volume. The optimal strategy of the center is the                                                   80


                                                                                 volume (pcs.)
redistribution of large production volumes of parts for the                                         60                                 r=0%
last time periods in which the production laboriousness of
parts is less than in the initial ones.                                                             40                                 r=5%
                                                                                                    20                                 r=10%
                           250
 Optimal cumulative


                                                                                                     0                                 r=15%
                           200
   volume (pcs.)
     production


                                                                                                          1     3   5   7   9   11
                           150                            r=0%
                                                                                                              Time period (months)
                           100                            r=5%
                           50                             r=10%           Fig. 4. The dependence of the optimal production volume on the discount
                                                                          rate for the logistic laboriousness model.
                            0                             r=15%
                                 0 2 4 6 8 10 12                              When discounting for the logistic model of laboriousness
                                 Time period (months)                     is taken into account, the effect of postponing the parts
                                                                          production from the initial time periods to later ones is also
                                                                          observed. Discounting leads to the appearance of the
Fig. 2. The dependence of the optimal cumulative production volume on     cumulative production volume of an additional convex
the discount rate for the logistic laboriousness model.
                                                                          section in the initial time periods on the optimal trajectory.
    With an increase in the discount rate, the center’s strategy              The optimal trajectory of the cumulative production
to redistribute large production volumes of parts for the last            volume is transformed into a curve of three sections: convex,
time periods intensifies. This is due to the “cheaper” cost of            concave and convex. The optimal strategy of the center is: on
the money that the center pays to the agent as a material                 convex sections of the trajectory to increase production
reward in remote time periods. With large discount rates, the             volumes, on concave sections - to decrease. Inflection points
effect of deferring the production of parts from the initial              correspond to extreme values of production volumes.
time periods to later ones occurs. It is economically
advantageous for the center to postpone the production of                                                       IV. CONCLUSION
parts to late time periods, since in this case its total                      The paper considers the dynamic executors incentive task
discounted costs will be minimal.                                         in discrete time, taking into account the learning curve effect.
                                                                          To solve the problem, the principle of cost compensation has
                           250                                            been applied, which consists in dividing the original problem
      Optimal cumulative


                           200                                            into two tasks: choosing a compensatory incentive system
         volume (pcs.)


                                                           r=0%           and solving the optimal control problem with the objective
          production


                           150                                            function equal to the difference between the income of the
                                                           r=10%          center and the labor costs of the agent.
                           100
                                                           r=20%             Using the Bellman dynamic programming method,
                           50                                             numerical solutions of the optimal control problem are
                                                           r=30%          obtained for various laboriousness models. The study of the
                            0
                                                           r=40%          impact of the discount rate on the solution of the incentive
                                 0 2 4 6 8 10 12                          problem was conducted.
                                 Time period (months)
                                                                              Based on a numerical study, the following conclusions
                                                                          are formulated:
Fig. 3. The dependence of the optimal cumulative production volume on
the discount rate for the exponential laboriousness model.                   1. The optimal strategy of the center for the power -
                                                                          based and exponential learning curves models is to


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                        253
Data Science

redistribute large production volumes of parts to the last time           [4]  M.A. Gorelov and A.F. Kononenko, “Dynamic conflict models III.
periods in which the production laboriousness of parts is less                 Hierarchical games,” Automation and Remote Control, vol. 2, pp. 89-
                                                                               106, 2015.
than in the initial ones.
                                                                          [5] T. Basar and G.J. Olsder, “Dynamic Noncooperative Game Theory,”
    2. The consideration of discounting for the power - based                  Philadelphia: SIAM, 1999, 519 p.
and exponential learning curves models leads to an even                   [6] E. Dockner, S. Jorgensen, N.V. Long and G. Sorger, “Differential
greater redistribution of the production volumes of parts over                 Games in Economics and Management Science,” Cambridge:
                                                                               Cambridge University Press, 2000, 382 p.
the last time periods.
                                                                          [7] Y.-C. Ho, P. Luh and R. Muralidharan, “Information Structure,
   3. Taking into account the discounting for all the                          Stackelberg Games, and Incentive Controllability,” IEEE Trans.
considered learning curves models leads to the effect of                       Automat. Control., vol. 26, no. 2, pp. 454-460, 1981.
postponing production from initial periods to later ones.                 [8] G.J. Olsder, “Phenomena in inverse Stackelberg games. Part 1: static
                                                                               problems,” J. Optim. Theory Appl., vol. 143, no. 3, pp. 589-600,
    4. The optimal trajectory of the cumulative production                     2009.
volume in the case of the logistic learning curve model is a              [9] G.J. Olsder, “Phenomena in inverse Stackelberg games. Part 2:
curve consisting of several convex and concave sections. The                   dynamic problems,” J. Optim. Theory Appl., vol. 143, no. 3, pp. 601-
                                                                               618, 2009.
optimal strategy of the center is to increase production
                                                                          [10] N. Groot, B. De Schutter and H. Hellendoorn, “Reverse Stackelberg
volumes on convex sections of the trajectory, and to decrease                  games. Part I: basic framework,” Proc. of the IEEE Int. Conf. on
production volumes on concave sections. Inflection points                      Control Applications, pp. 421-426, 2012.
correspond to extreme values of production volumes.                       [11] N. Groot, B. De Schutter and H. Hellendoorn, “Reverse Stackelberg
                                                                               games. Part II: results and open issues,” Proc. of the IEEE Int. Conf.
    5. Taking into account the discounting for the logistic                    on Control Applications, pp. 427-432, 2012.
learning curve model leads to a redistribution of production              [12] D.B. Rokhlin and G.A. Ougolnitsky, “Stackelberg equilibrium in a
volumes of parts in the middle and recent time periods.                        dynamic stimulation model with complete information,” Automation
                                                                               and Remote Control, vol. 79, no. 4, pp. 701-712, 2018.
                       ACKNOWLEDGMENT                                     [13] A. Badiru, “Computational survey of univariate and multivariate
    The reported study was funded by RFBR and Samara                           learning curve models,” IEEE Transactions on Engineering
region according to the research project № 17-46-630606.                       Management, vol. 39, no. 2, pp. 176-188, 1992.
                                                                          [14] L.E. Yelle, “The learning curve: historical review and comprehensive
                           REFERENCES                                          survey,” Decision Sciences, vol. 10, no. 2, pp. 302-328, 1979.
                                                                          [15] Y. Jaber, “Learning Curves: Theory, Models, and Applications,”
[1]   D.A. Novikov, M.I. Smirnov and T.E. Shokhina, “Mechanisms of
                                                                               Boca Raton: CRC Press, 2011, 476 p.
      Dynamic Active Systems Control,” Moscow: IPU RAN, 2002, 124 p.
                                                                          [16] R. Bellman, “Dynamic Programming,” Moscow: Izdatelstvo
[2]   V.A. Gorelik, M.A. Gorelov and A.F. Kononenko, “Analysis of
                                                                               inostrannoy literatury, 1960.
      Conflict Situations in Control Systems,” Moscow: Radio i svyaz,
      1991, 228 p.
[3]   V.A. Gorelik and A.F. Kononenko, “Game-theoretic Models of
      Decision Making in Ecological–economic Systems,” Moscow: Radio
      i svyaz, 1982, 144 p.


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                          254