Numerical solution of the dynamic incentive problem in discrete time taking into account the learning curve effect Oleg Pavlov Institute of economics and management Samara National Research University Samara, Russia pavlov@ssau.ru Abstract—The paper considers the dynamic incentive program. When making a decision, the center proceeds from problem in discrete time, taking into account the learning the principle of maximum guaranteed result. As a result, the effect. The task is formulated as a dynamic game between the initial problem is transformed into the optimal control leader and the performers. To solve the problem, the principle problem. of cost recovery is applied, which reduces the original task to the optimal control problem in discrete time. Numerical In this article basing on the approach [1,12], the dynamic solutions of the problem for various models of learning curves incentive problem of agents taking into account the learning are obtained using the Bellman dynamic programming curve effect is formulated and numerically solved using the method. Also, the study is conducted of the discount rate’s Bellman dynamic programming method. impact on the solution of the incentive problem II. STATEMENT AND ALGORITHM FOR SOLVING THE DYNAMIC Keywords—dynamic incentive problem, inverse Stackelberg INCENTIVE PROBLEM OF AGENTS game, learning curve effect, Bellman dynamic programming A two-level dynamic manufacturing system consisting of I. INTRODUCTION a center and n independent agents is considered. Agents produce parts from which the finished product is then The article discusses the game dynamic task of the assembled. Labor costs and financial incentives for agents executors performing the production task in the context of depend only on their own actions. This article applies the new product development. The development of new principle of game decomposition [1], which allows to products at industrial enterprises is characterized by the consider the management of the i-th agent independently and learning curve effect, which is that the time spent by not to take into account the interaction of agents with each employees (laboriousness) on performing multiple repetitive other. The state of a dynamic production system depends on production operations is reduced. the actions of agents, and the center affects the managed The task of executors stimulation is one of the most system only through the payment of material remuneration to important in the management theory. The management (the agents. center) should choose such an incentive system based on the The dynamics of part production by the i-th agent is forecast of the agent’s actions in order to ensure the described by a discrete equation: fulfillment of their economic interests. The executor (the agent) chooses an action (volume of work) based on his  x t  x t  1  u t , t  1 ,T ,   economic interests. where xt is the cumulative production volume of the part in the time period t, t is the number of the time period, ut is the Dynamic problems of interaction of unequal players are production volume of the part in the period t, T is the considered in the active systems theory [1], in the quantity of time periods considered. information theory of hierarchical systems [2–4] and in the dynamic games theory developed by international authors Before the start of mass production, we know the number [5–11]. It should be noted that the stimulation problem in of manufactured parts, it is as follows: different theories has received various names. In the active systems theory it is the incentive task, in publications of  x0  X 0 .   foreign authors on game theory it is the inverse Stackelberg In the final time period, the cumulative volume of parts game, in the information theory of hierarchical systems it is must be equal to the specified as follows: the Germeyer game.  xT  X 0  R ,   The active systems theory [1] offers the approach called where R is the specified number of parts. the principle of agent’s cost compensation. The center pays material remuneration to the agent, compensating his costs, Restrictions are imposed on the production volume of the in the case of choosing the optimal planned trajectory of the part: center and does not pay material compensation otherwise.  0  u t  X 0  R  x t  1 , t  1 ,T .   The initial problem is divided into two tasks: the choice of The target function of the center is to maximize the the incentive system and the solution of the optimal control discounted total difference between the income from the problem. In [12], results are presented that generalize the manufactured parts and the costs of the agent’s material theorems from the monograph [1]. compensation: The hierarchical systems theory [2–4] suggests the T approach that uses the center’s choice of the program of joint 1  Jp   [ p u t   ( x t )]  m a x ,   actions with the agent and punishment for deviation from this t 1 ( 1  r ) t Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Data Science where p is the part price,  ( x t ) is the center incentive   ( x t )  C t ( u t , x t  1 )   function, r is the center discount rate. We substitute the formula (4) into the target function of the center, taking into account (3): The incentive function of the center is a rule in accordance with which a material remuneration is assigned T 1 to the agent for the amount of work performed. The center  Jp   t [ p  sсt ] u t  m a x .   t 1 ( 1  r ) manages the production process through the mechanism of Since the part price p is constant, the center can increase material incentives  ( x t ) , economically encouraging agents his profit only by minimizing the total cost of paying the to fulfill the planned production volumes. agent’s material remuneration. The target function of the The discount rate helps to take into account the time center will take the following form: preferences of the center (agent) for the cost of cash flows. T 1 The more distant in time the cash flow, the cheaper it is for  Jp   t sс t u t  m in .   the center (agent). t 1 ( 1  r ) Thus, the initial dynamic incentive task is reduced to the The target function of the agent is to maximize the optimal control problem: discounted total difference between material remuneration and labor costs, expressed in monetary form: T 1  Jp   t sс t u t  m in .   T 1 t 1 ( 1  r )  Ja   [  ( x t )  C t ( u t , x t 1 ) ]  m a x ,   t 1 ( 1  r ) t  x t  x t 1  u t , t  1 ,T ,   where r is the agent discount rate, C t ( u t , x t  1 ) is the agent  x0  X 0 ,   labor costs.  xT  X 0  R ,    0  u t  X 0  R  x t  1 , t  1 ,T .   Agent labor costs are determined by the following equation: The center’s task is to select the optimal production volumes of parts u to p t , taking into account restrictions (9),  C t ( u t , x t 1 )  s с t u t   under which the production process (6) will switch from the where s is the cost of one hour per agent, ct is the initial state (7) to the final state (8) and the minimum of the laboriousness of manufacturing the part. center’s target function (5) will be achieved. The dependence of the part laboriousness on the The formulated optimal control problem (5)-(9) was cumulative production volume is described by various solved using the Bellman dynamic programming method models of the learning curve given in [13]-[15]. [16], implemented in the pascal programming language. In accordance with his economic interests, the agent III. THE RESULTS OF THE NUMERICAL SOLUTION OF THE selects parts production volumes that maximize his target DYNAMIC INCENTIVE PROBLEM OF AGENTS function (2). The center’s task is to choose the optimal incentive system in which the agent will produce such parts The numerical solution of the optimal control problem is production volumes that maximize the center target function carried out on the example of the production of parts of the (1). enterprise Salut JSC. According to the enterprise data, regression models of the of laboriousness manufacturing To solve the formulated control problem, the principle of parts are constructed: power, exponential and logistic. cost compensation is applied [1, 12]. The solution algorithm consists in dividing the initial problem into two tasks: Power-based labour input model: choosing a compensatory incentive system and solving the optimal control problem with the objective function equal to  c t  4 2 ,6 4 x t  1 .   0 ,3  the difference between the center’s income and the agent’s Exponential labor input model: labor costs.  с t  9 ,1 7  6 ,1 6 e  0 ,0 3 x t  1 .  1. The choice of a compensatory incentive system. Logistic labor input model: The center selects a compensatory incentive system,  1  which consists in compensating the agent costs in the case of  с t  5 5 ,1 0  3 6 , 6 1  0 ,0 5 x t  1  .  choosing the optimal planned production volume of the  1  0 , 0 1 7 e  center x to p t and the absence of material payments otherwise: To solve the problem, the following data was used: the number of time periods T=12 months, the production volume of parts R=240 pcs., production experience before serial  opt  C t ( u t , x t  1 ), е с л и x t  x t , д л я  t  1 ,T ,   ( xt )     production x 0  1 pcs. The discrete step of changing the opt  0, е с л и x t  x t , д л я  t  1 ,T . parts production volume when implementing the dynamic 2. The solution of the optimal control problem with the programming method is 1 pcs. target function equal to the difference between the center income and the agent labor costs. Numerical solutions of the optimal control problem for power-based, exponential and logistic models of labor input To encourage the agent to choose the planned production are presented in Fig. 1-3. The figures show the optimal volume, the center pays a material remuneration equal to the trajectories of cumulative production volumes for various agent costs: discount rates. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 252 Data Science 250 Analyzing Fig. 3, we conclude that for the logistic laboriousness model in the absence of discounting (r=0%), Optimal cumulative 200 r=0% the optimal trajectory of cumulative production volume is the volume (pcs.) production 150 logistic curve. The optimal trajectory of the cumulative r=10% production volume consists of two sections: concave and 100 convex. r=20% 50 Fig. 4 shows the optimal trajectories of production 0 r=30% volumes for various discount rates r. The optimal strategy of 0 2 4 6 8 10 12 r=40% the center in the absence of discounting (r = 0%) is: Time period (months) reduction of production volumes for the concave section of the optimal trajectory of the cumulative production volume and increase in production volumes for the convex section of Fig. 1. The dependence of the optimal cumulative production volume on the trajectory. The minimum of production volume the discount rate for the power-based laboriousness model. corresponds to the inflection point of the optimal trajectory of the cumulative production volume. From an analysis of Fig. 1-2, it follows that for a power- based and exponential model of labour input, a convex curve 100 Optimal production is the optimal trajectory of the cumulative production volume. The optimal strategy of the center is the 80 volume (pcs.) redistribution of large production volumes of parts for the 60 r=0% last time periods in which the production laboriousness of parts is less than in the initial ones. 40 r=5% 20 r=10% 250 Optimal cumulative 0 r=15% 200 volume (pcs.) production 1 3 5 7 9 11 150 r=0% Time period (months) 100 r=5% 50 r=10% Fig. 4. The dependence of the optimal production volume on the discount rate for the logistic laboriousness model. 0 r=15% 0 2 4 6 8 10 12 When discounting for the logistic model of laboriousness Time period (months) is taken into account, the effect of postponing the parts production from the initial time periods to later ones is also observed. Discounting leads to the appearance of the Fig. 2. The dependence of the optimal cumulative production volume on cumulative production volume of an additional convex the discount rate for the logistic laboriousness model. section in the initial time periods on the optimal trajectory. With an increase in the discount rate, the center’s strategy The optimal trajectory of the cumulative production to redistribute large production volumes of parts for the last volume is transformed into a curve of three sections: convex, time periods intensifies. This is due to the “cheaper” cost of concave and convex. The optimal strategy of the center is: on the money that the center pays to the agent as a material convex sections of the trajectory to increase production reward in remote time periods. With large discount rates, the volumes, on concave sections - to decrease. Inflection points effect of deferring the production of parts from the initial correspond to extreme values of production volumes. time periods to later ones occurs. It is economically advantageous for the center to postpone the production of IV. CONCLUSION parts to late time periods, since in this case its total The paper considers the dynamic executors incentive task discounted costs will be minimal. in discrete time, taking into account the learning curve effect. To solve the problem, the principle of cost compensation has 250 been applied, which consists in dividing the original problem Optimal cumulative 200 into two tasks: choosing a compensatory incentive system volume (pcs.) r=0% and solving the optimal control problem with the objective production 150 function equal to the difference between the income of the r=10% center and the labor costs of the agent. 100 r=20% Using the Bellman dynamic programming method, 50 numerical solutions of the optimal control problem are r=30% obtained for various laboriousness models. The study of the 0 r=40% impact of the discount rate on the solution of the incentive 0 2 4 6 8 10 12 problem was conducted. Time period (months) Based on a numerical study, the following conclusions are formulated: Fig. 3. The dependence of the optimal cumulative production volume on the discount rate for the exponential laboriousness model. 1. The optimal strategy of the center for the power - based and exponential learning curves models is to VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 253 Data Science redistribute large production volumes of parts to the last time [4] M.A. Gorelov and A.F. Kononenko, “Dynamic conflict models III. periods in which the production laboriousness of parts is less Hierarchical games,” Automation and Remote Control, vol. 2, pp. 89- 106, 2015. than in the initial ones. [5] T. Basar and G.J. Olsder, “Dynamic Noncooperative Game Theory,” 2. The consideration of discounting for the power - based Philadelphia: SIAM, 1999, 519 p. and exponential learning curves models leads to an even [6] E. Dockner, S. Jorgensen, N.V. Long and G. Sorger, “Differential greater redistribution of the production volumes of parts over Games in Economics and Management Science,” Cambridge: Cambridge University Press, 2000, 382 p. the last time periods. [7] Y.-C. Ho, P. Luh and R. Muralidharan, “Information Structure, 3. Taking into account the discounting for all the Stackelberg Games, and Incentive Controllability,” IEEE Trans. considered learning curves models leads to the effect of Automat. Control., vol. 26, no. 2, pp. 454-460, 1981. postponing production from initial periods to later ones. [8] G.J. Olsder, “Phenomena in inverse Stackelberg games. Part 1: static problems,” J. Optim. Theory Appl., vol. 143, no. 3, pp. 589-600, 4. The optimal trajectory of the cumulative production 2009. volume in the case of the logistic learning curve model is a [9] G.J. Olsder, “Phenomena in inverse Stackelberg games. Part 2: curve consisting of several convex and concave sections. The dynamic problems,” J. Optim. Theory Appl., vol. 143, no. 3, pp. 601- 618, 2009. optimal strategy of the center is to increase production [10] N. Groot, B. De Schutter and H. Hellendoorn, “Reverse Stackelberg volumes on convex sections of the trajectory, and to decrease games. Part I: basic framework,” Proc. of the IEEE Int. Conf. on production volumes on concave sections. Inflection points Control Applications, pp. 421-426, 2012. correspond to extreme values of production volumes. [11] N. Groot, B. De Schutter and H. Hellendoorn, “Reverse Stackelberg games. Part II: results and open issues,” Proc. of the IEEE Int. Conf. 5. Taking into account the discounting for the logistic on Control Applications, pp. 427-432, 2012. learning curve model leads to a redistribution of production [12] D.B. Rokhlin and G.A. Ougolnitsky, “Stackelberg equilibrium in a volumes of parts in the middle and recent time periods. dynamic stimulation model with complete information,” Automation and Remote Control, vol. 79, no. 4, pp. 701-712, 2018. ACKNOWLEDGMENT [13] A. Badiru, “Computational survey of univariate and multivariate The reported study was funded by RFBR and Samara learning curve models,” IEEE Transactions on Engineering region according to the research project № 17-46-630606. Management, vol. 39, no. 2, pp. 176-188, 1992. [14] L.E. Yelle, “The learning curve: historical review and comprehensive REFERENCES survey,” Decision Sciences, vol. 10, no. 2, pp. 302-328, 1979. [15] Y. Jaber, “Learning Curves: Theory, Models, and Applications,” [1] D.A. Novikov, M.I. Smirnov and T.E. Shokhina, “Mechanisms of Boca Raton: CRC Press, 2011, 476 p. Dynamic Active Systems Control,” Moscow: IPU RAN, 2002, 124 p. [16] R. Bellman, “Dynamic Programming,” Moscow: Izdatelstvo [2] V.A. Gorelik, M.A. Gorelov and A.F. Kononenko, “Analysis of inostrannoy literatury, 1960. Conflict Situations in Control Systems,” Moscow: Radio i svyaz, 1991, 228 p. [3] V.A. Gorelik and A.F. Kononenko, “Game-theoretic Models of Decision Making in Ecological–economic Systems,” Moscow: Radio i svyaz, 1982, 144 p. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 254