Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 ON DEEP LEARNING FOR OPTION PRICING IN LOCAL VOLATILITY MODELS S.G. Shorokhov Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya St, Moscow, 117198, Russia E-mail: shorokhov-sg@rudn.ru We study neural network approximation of the solution to boundary value problem for Black-Scholes- Merton partial differential equation for a European call option price, when model volatility is a function of underlying asset price and time (local volatility model). Strike-price and expiry day of the option are assumed to be fixed. An approximation to option price in local volatility model is obtained via deep learning with deep Galerkin method (DGM), making use of the neural network of special architecture and stochastic gradient descent on a sequence of random time and underlying price points. Architecture of the neural network and the algorithm of its training for option pricing in local volatility models are described in detail. Computational experiment with DGM neural network is performed to evaluate the quality of neural network approximation for hyperbolic sine local volatility model with known exact closed form option price. The quality of the neural network approximation is estimated with mean absolute error, mean squared error and coefficient of determination. The computational experiment demonstrates that DGM neural network approximation converges to a European call option price of the local volatility model with acceptable accuracy. Keywords: partial differential equation, local volatility model, option price, neural network Sergey Shorokhov Copyright Β© 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 381 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Introduction In a local volatility model (LVM) [1], in contrast to Black-Scholes constant volatility model [2, 3], the volatility depends on underlying asset price 𝑆 and time 𝑑. Well-known LVM with exact closed form solutions include CEV model [4], shifted lognormal model [5] and normal model (a clone of Ornstein–Uhlenbeck model [6]). When evaluating derivatives with LVM, the boundary (terminal) value problem for Black- Scholes-Merton (BSM) partial differential equation (PDE) is to be solved. Exact closed form solutions of terminal value problem for BSM PDE are known only in a few special cases, therefore, in general case application of numerical methods such as binomial trees, Monte Carlo simulations, Fourier or finite difference methods is required. Alternatively, derivative prices in LVM can be approximated with an artificial neural network (ANN). The idea to use ANNs for option pricing is several decades old (see [7] and the references therein), however, the need to improve the quality of option price approximation stimulates further research. Deep Galerkin Method (DGM), introduced recently in [8] for the solution of PDEs, makes use of the neural network of special architecture and stochastic gradient descent (SGD) [9] on a sequence of random time and space points. The method has been successfully applied to various PDEs [8], including BSM PDE with constant volatility. Our goal is to study application of DGM approach [8] to option pricing when volatility function is not constant and evaluate the quality of ANN approximation for an LVM with known exact analytical closed form solution. 2. Deep option pricing with local volatility models For pricing of a European call option with strike-price 𝐾 and expiry day 𝑇 in LVM with a volatility function Οƒ(𝑆, 𝑑 ) and a risk-free interest rate π‘Ÿ > 0, the solution of BSM PDE [2, 3] βˆ‚u βˆ‚u 1 2 βˆ‚2 u +π‘Ÿπ‘† + Οƒ (𝑆, 𝑑) 𝑆 2 2 βˆ’ π‘Ÿ u = 0#(1) βˆ‚π‘‘ βˆ‚π‘† 2 βˆ‚π‘† with terminal condition u(𝑆, 𝑇, 𝐾, 𝑇 ) = max (𝑆 βˆ’ 𝐾, 0) #(2) is to be determined. Strike-price 𝐾 and expiry day 𝑇 are assumed to be fixed. To determine ANN approximation to PDE (1) with condition (2) using DGM approach the neural network of special architecture [8] is to be built and trained. The architecture of the ANN is similar to architectures of LSTM [10] and Highway [11] networks. It consists of the layers in Fig.1: an input layer, 𝑑 hidden (LSTM) layers and an output layer. Figure 1. Architecture of DGM neural network The input to DGM ANN is a set of randomly sampled price-time points x = (𝑆, 𝑑). In the input layer, the price-time points x are transformed into the output X 0 382 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 X 0 = Οƒ(w0 x + b0 ) with a nonlinear activation function Οƒ and input layer parameters w0 and b0 . Each hidden (LSTM) layer receives as an input the original set of price-time points x and the output of the previous layer. In hidden layers, the price-time points x and the outputs of the previous layer X iβˆ’1 are processed with the following transformations: Zi = Οƒ(uzi x + wiz Xiβˆ’1 + bzi ), Ri = Οƒ(uri x + wir Xiβˆ’1 + bri ), g g g Gi = Οƒ(ui x + wi Xiβˆ’1 + bi ), Hi = Οƒ(uhi x + wih (X iβˆ’1 βŠ™ Ri ) + bhi ), where βŠ™ denotes element-wise multiplication, and the outputs of the layer are X i = (1 βˆ’ Gi ) βŠ™ Hi + Zi βŠ™ X iβˆ’1 . In the output layer, the outputs of the last LSTM layer X d are transformed into the neural network outputs y with a linear transform y = f(x; ΞΈ) = w β€² X d + bβ€² , where w β€² and bβ€² are the output layer parameters. The output of the DGM neural network y is the approximation of option price u at the initial price-time points x. Figure 2. Architecture of LSTM layer in DGM network The number of parameters (weights and biases) in DGM ANN can be calculated as follows. Let 𝑁 be the number of neurons (nodes) in each hidden layer of DGM neural network. In the input layer, the shape of the weight parameter w0 is 2 Γ— 𝑁 and the shape of the bias parameter b0 is 1 Γ— 𝑁. g In hidden LSTM layers, the shape of the weight parameters uzi , ui , uri , uhi is 2 Γ— 𝑁, the shape g g of the weight parameters wiz, wi , wir, wih is 𝑁 Γ— 𝑁, the shape of the bias parameters bzi , bi , bri , bhi is 1 Γ— 𝑁. In the output layer, the shape of the weight parameter w β€² is 𝑁 Γ— 1 and bβ€² is a scalar parameter. The neural network parameter set πœƒ contains all weight and bias parameters mentioned above. Thus, the total number of parameters in DGM neural network is equal to |πœƒ | = 4 d (N + 1)2 + 4 N + 1. DGM neural network is trained with adaptive algorithm Adam [12], which is an extension to classical SGD algorithm. General outline of DGM algorithm for the solution of BSM PDE (1)-(2) is shown below in Algorithm 1. 383 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Algorithm 1: Approximation of option price in local volatility model with DGM neural network Data: a volatility function 𝜎(𝑆, 𝑑), strike-price 𝐾, time to maturity 𝑇, risk free interest rate π‘Ÿ, distributions 𝜈1 and 𝜈2 , absolute tolerance πœ€ > 0; Result: optimal parameter set πœƒ βˆ— for the approximation of option price in LVM; – choose initial parameter set πœƒ0 and learning rate 𝛼0 ; repeat – generate random time-price points (𝑆𝑛, 𝑑𝑛) from 𝛺 Γ— [0, 𝑇] with distribution 𝜈1 and random price points 𝑆𝑛′ from 𝛺 with distribution 𝜈2 , 𝛺 = [𝑠𝑙 , π‘ β„Ž ] βŠ‚ ℝ; – calculate the loss function 𝐿(πœƒπ‘› , πœ‰π‘› ) at the randomly sampled points ξ𝑛 = {(𝑆𝑛, 𝑑𝑛 ), 𝑆𝑛′ } , where 𝐿(πœƒπ‘›, πœ‰π‘›) πœ•π‘“(𝑆𝑛, 𝑑𝑛 ; πœƒπ‘› ) πœ•π‘“(𝑆𝑛, 𝑑𝑛 ; πœƒπ‘› ) ←( + π‘Ÿ 𝑆𝑛 πœ•π‘‘ πœ•π‘† 2 2 πœ• 𝑓(𝑆𝑛, 𝑑𝑛 ; πœƒπ‘› ) 2 1 + Οƒ2 (𝑆𝑛, 𝑑𝑛 ) 𝑆𝑛 – π‘Ÿ 𝑓(𝑆𝑛, 𝑑𝑛; πœƒπ‘› )) 2 πœ•π‘† 2 2 + (𝑓(𝑆𝑛′, 𝑇; πœƒπ‘› )– max (𝑆𝑛′– 𝐾)) . – update parameter set πœƒ for a gradient descent step at the random points πœ‰π‘› for the learning rate 𝛼𝑛 with adaptive algorithm Adam [12]: πœƒπ‘›+1 ← πœƒπ‘› βˆ’ 𝛼𝑛 π›»πœƒ 𝐿(πœƒπ‘› , πœ‰π‘› ). until β€–πœƒπ‘›+1 βˆ’ πœƒπ‘› β€– < πœ€; πœƒ βˆ— ← πœƒπ‘›+1; As a result of Algorithm 1, an approximation of the price of a European call option in LVM with volatility function Οƒ(𝑆, 𝑑) is obtained in the form u(𝑑, 𝑆) = 𝑓(𝑑, 𝑆; πœƒ βˆ— ). 3. Computational experiment for hyperbolic sine LVM Consider hyperbolic sine LVM [13] with underlying asset price driven by SDE 𝑑𝑆 = π‘Ÿ 𝑆 𝑑𝑑 + √2 π‘Ÿ 𝑆 2 + πœ†2 π‘‘π‘Š, π‘Ÿ > 0, πœ† > 0#(3) with the following BSM PDE for a derivative price βˆ‚u βˆ‚u 1 βˆ‚2 u + π‘Ÿπ‘† + (2 π‘Ÿ 𝑆 2 + πœ†2 ) 2 βˆ’ π‘Ÿ u = 0. #(4) βˆ‚π‘‘ βˆ‚π‘† 2 βˆ‚π‘† The goal is to find a DGM estimate of a European option price in hyperbolic sine LVM (3), compare it with known analytical option price [13] and evaluate the quality of approximation. DGM ANN for PDE (4) was implemented with TensorFlow framework [14]. The computational experiment is performed with the following parameters: the number 𝑑 of hidden layers is 3, the number 𝑁 of nodes (neurons) per hidden layer is 50, the number of training stages is 100 with 10 SGD steps in each stage, and other parameters of the model are as follows r = 0.05, Ξ» = 0.25, K = 50, T = 1, S0 = 0.5. Quality of obtained ANN approximation is characterized by the following metrics: ο‚· mean absolute error (MAE) is 0.2014; ο‚· mean squared error (MSE) is 0.2483; ο‚· coefficient of determination (𝑅 2) is 99.93%. 384 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Absolute Error Surface of DGM Option Price Estimate The resulting error of approximation, i.e. difference between exact analytical option prices in hyperbolic sine LVM (3) [13] and option prices, predicted by DGM ANN, is visualized in Fig. 3. 4.63 4.17 Absolut e Opt ion Price Error 3.70 4 3.24 2.78 3 2.32 1.85 1.39 2 0.93 0.46 1 0.00 100 80 0 60 Pr 1.0 i ce 40 0.8 S 0.6 20 0.4 t 0.2 T i me 0 0.0 Figure 3. Absolute Error Surface of DGM Option Price Approximation The computational experiment shows that the approximation, obtained with DGM ANN, predicts option prices in hyperbolic sine LVM (3) with acceptable accuracy, but the quality of approximation deteriorates for options ATM (At The Money) at expiry day and for options ITM (In The Money) with long time to maturity. 4. Conclusion and Future plans Generally, option exchange trades various options on the same underlying with a range of exercise (strike) prices and expiry days, so to price all these options ANN shall receive as an input the set of price-time-strike price-expiry day points (𝑆, 𝑑, 𝐾, 𝑇) instead of price-time points (𝑆, 𝑑). This transition from input (𝑆, 𝑑) to input (𝑆, 𝑑, 𝐾, 𝑇) may require different architecture of ANN and another strategy of its training. As noted in [15,16], the loss function in the form of energy functional (potential) is preferable for loss minimization, so construction of variational formulation for BSM PDE (1) can contribute to deep option pricing. Energy functional (potential) for BSM PDE (1) may be obtained using methods of the inverse problem of the calculus of variations [17]. The computational experiment with deep option pricing in hyperbolic sine LVM demonstrates that the algorithm converges to exact analytical European call option price of the LVM with acceptable accuracy, but oscillating behavior of the option price approximation makes it desirable to modify the neural network architecture for smoothing its output. 385 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 References [1] B. Dupire. Pricing with a smile // Risk Magazine 7 (1) (1994) 18–20. [2] F. Black, M. Scholes. The Pricing of Options and Corporate Liabilities // Journal of Political Economy 81 (3) (1973) 637–654. doi:10.1086/260062 [3] R. C. Merton. Theory of Rational Option Pricing // The Bell Journal of Economics and Management Science 4 (1) (1973) 141–183. doi:10.2307/3003143 [4] J. C. Cox, S. A. Ross. The valuation of options for alternative stochastic processes // Journal of Financial Economics 3 (1-2) (1976) 145–166. doi:10.1016/0304-405x(76)90023-4 [5] D. Brigo, F. Mercurio. Fitting volatility skews and smiles with analytical stock-price models, Seminar paper, Institute of Finance, University of Lugano (2000). [6] G. E. Uhlenbeck, L. S. Ornstein. On the theory of the brownian motion // Phys. Rev. 36 (1930) 823–841. doi:10.1103/PhysRev.36.823 [7] S. Liu, C. W. Oosterlee, S. M. Bohte. Pricing Options and Computing Implied Volatilities using Neural Networks // Risks 7 (2019) 16. doi:10.3390/risks7010016 [8] J. Sirignano, K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations // Journal of Computational Physics 375 (2018) 1339–1364. doi:10.1016/j.jcp.2018.08.029 [9] L. Bottou, O. Bousquet. The tradeoffs of large scale learning, in: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, Curran Associates Inc., Red Hook, NY, USA (2007) 161–168. [10] S. Hochreiter, J. Schmidhuber. Long short-term memory // Neural Computation 9 (8) (1997) 1735–1780. doi:10.1162/neco.1997.9.8.1735 [11] R. K. Srivastava, K. Greff, J. Schmidhuber. Training very deep networks, in: C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett (Eds.), Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada (2015) 2377–2385. [12] D. P. Kingma, J. Ba. Adam: A method for stochastic optimization, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). URL http://arxiv.org/abs/1412.6980 [13] S. Shorokhov, M. Fomin. Modeling of financial asset prices with hyperbolic-sine stochastic model, in: V. Sukhomlin, E. Zubareva (Eds.), Convergent Cognitive Information Technologies. Convergent 2018, Communications in Computer and Information Science, Springer, Cham 1140 (2020) Ch. 1, 3–10. doi:10.1007/978-3-030-37436-5_1 [14] M. Abadi, et al. TensorFlow: a system for large-scale machine learning, in: OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, 2015 (2016) 265– 283. [15] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, P. Perdikaris. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data // Journal of Computational Physics 394 (2019) 56–81. doi:10.1016/j.jcp.2019.05.024 [16] N. Geneva, N. Zabaras. Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks // Journal of Computational Physics 403 (2020) 109056. doi:10.1016/j.jcp.2019.109056 [17] V. M. Filippov, V. M. Savchin, S. G. Shorokhov. Variational principles for nonpotential operators // Journal of Mathematical Sciences 68 (3) (1994) 275–398. doi:10.1007/bf01252319 386