Introduction

On Solving the Problem of Optimal Probability Distribution Quantization

Dmitry Golembiovsky

golemb@cs.msu.su 0

Dmitry Denisov

dvden@bk.ru 0

Evgeny Antonov

eugeneP.antonov@gmail.com 0 0 Lomonosov Moscow State University , GSP-1, Leninskie Gory, 119991 Moscow , Russia

217 223

The program of optimal quantization of a continuous distribution suggested by Heitsch H. and W. Romisch in 2003 is generalized for arbitrage exclusion in financial models. It is a non-convex problem, which belongs to the class of NP-hard problems. In the paper, three different approximate algorithms are developed for finding the global extremum. Two of them are based on the separation of variables according to their power in the objective function while the other is a SQP algorithm. The numerical results of using the algorithms are provided. The effectiveness and speed of the problem solving are compared.

Introduction

Currently one of the most rapidly developing optimization approaches is the stochastic dual dynamic programming (SDDP) algorithm (see [Pereira & Pinto, 1991]) . The algorithm belongs to the class of dynamic optimization algorithms and allows to solve problems where one or more parameters are represented by random variables. SDDP is a procedure of consecutive solving of linear problems, thus obtaining new constraints of objective function in a form of cutting hyperplanes and, as a result, obtaining upper and lower bounds of an optimal solution of the problem. One of the essential parts of SDDP algorithm is constructing a set of possible scenarios of random variables. Scenario models for stochastic optimization in finance must exclude arbitrage possibilities (look, for example [Consiglio et al., 2014], [Geyer et al., 2010]) .

The paper analyzes the problem of constructing a scenario lattice for the joint evolution of a set of variables representing the values of economic indicators such as interest rates, exchange rates, prices of goods, securities or other financial instruments. It is assumed that the distribution of such variables is known, and a number of its realizations can be simulated. Also it is assumed that forward prices of such variables can be calculated at each stage of the simulated trajectories. The lattice considered in the paper satisfies several restrictions: positivity of each variable in the nodes, equality to one of the sum of the transition probabilities from a node and the distinctive feature of the lattice is an absence of arbitrage opportunities. It means that for each node of the current stage, an expected value of variables at the next stage is equal to its forward prices in the current node. The optimal lattice is obtained through solving an optimization problem which is not convex.

The second section of the paper presents formal definition of the problem, while 2.1.1, 2.1.2 and 2.2.1 subsections contain algorithms suggested for its solution. The first two are based on a separation of the variables of the optimization problem according to their power in the objective function while the latter uses a gradient of the objective function to iteratively approach the optimal value. The third section of the paper contains the comparison of the algorithms in particular cases. 2

Problem De nition and Methodology

Consider nx variables which represent particular economic indicators. Let the number of the nodes at the stage t is lt (equals 1 for the first stage) and the nodes till the stage t − 1 have been already formed. Denote by Ftk−1, k = 1, . . . , lt−1 the nx-elements vector of prices in the nodes of the previous stage and by F k(frw), k = 1, . . . , lt−1 t−1 the vector of corresponding forward prices. The stage t of the scenario lattice is constructed by solving the next optimization problem.

Denote by ftjk, j = 1, . . . , Lt the realizations of the vector of considered variables at the stage t generated by the Monte-Carlo method from the node k of the previous stage t − 1. (Like lt parameter Lt equal to 1 for t = 1, Lt >> lt). The vectors of the values of the random variables which are assigned to the nodes of the stage t are denoted by Fti, i = 1, . . . , lt. They are variables of the optimization problem.

We also introduce variables gtikj ≥ 0, where i = 1, . . . , lt is a number of the according node at the stage t; k = 1, . . . , lt−1 is a number of the node at the stage t − 1; j = 1, . . . , Lt is a number of realization of the random variables generated from the node k of the previous stage. Each variable gtikj shows to what extent the according random pattern ftjk belongs to the corresponding node. So, the condition lt ∑ gtikj = 1 i=1 must be fulfilled for all j and k. The transition probability from node k of the stage t − 1 to node i of stage t is obtained as ρitk = ∑jL=t 1 gtikj , k = 1, . . . , lt−1, i = 1, . . . , lt

Lt It follows from (1) and (2) that ∑lt

i=1 ρitk = 1 for any k = 1, . . . , lt−1.

The objective function of the optimization problem, similar to [Heitsch & Romisch, 2003], is a sum of the distances between the values of random variables relevant to the nodes of the stage t and realizations of the random patterns that were produced by Monte-Carlo:

min

Fti;i=1;:::;lt gtikj;k=1;:::;lt−1;i=1;:::;lt;j=1;:::;Lt lt−1 lt Lt ∑ ∑ ∑ gtikj(ftjk − Fti)T (ftjk − Fti) k=1 i=1 j=1 The problem includes the following constraints:

t ≥ 0, gtikj ≥ 0, k = 1, . . . , lt−1, i = 1, . . . , lt, j = 1, . . . , Lt F i lt ∑ gtikj = 1, k = 1, . . . , lt−1, j = 1, . . . , Lt i=1 ρitk = ∑jL=t 1 gtikj , k = 1, . . . , lt−1, i = 1, . . . , lt

Lt lt ∑ Ftiρitk = F k(frw), k = 1, . . . , lt−1

t−1 i=1

Constraints (7) ensure a non-arbitrage condition on the scenario lattice: for each node, the expected values of random variables at the next stage must be equal to their forward values in the node.

The objective function of the considered problem is non-convex. There are well-known methods of the global extremum search in box-constraint optimization problems developed in [Evtushenko & Posypkin, 2013], [Khamisov, 1999], [Strongin & Sergeyev, 2000]. In such methods, the objective function is approximated from below by linear and quadratic functions on the elements of the feasible set partition. The feasible set of the problem considered in the paper is determined by bilinear equality constraints and contains non-regular points. The goal of this paper is to compare effectiveness of methods based on a consecutive solving of less complex problems. The constraints structure of the problem allows us to explicitly define a starting feasible point.

Denote by X the set of feasible points (g, F ) for constraints of the problem (3) – (7). The problem is obviously has a solution if the set {F : (g, F ) ∈ X} is bounded. This property is fulfilled in a number of cases, namely the following lemma takes place.

Lemma 1: If for each point (g, F ) ∈ X the sum ∑jL=t 1 gtikj ≥ δ for some δ > 0, then X is bounded.

Proof: Assume that there exists the sequence {(g(n), F (n))} ⊂ X, n = 1, 2, . . ., for which ∥ F (n) ∥→ ∞, n → ∞. It follows from here that there exists an index m ∈ {1, 2, . . . , nx}, for which an element Fti;m(n) → ∞ for some i. From here and from (7) it follows that Ftk−1;m = ∞, that contradicts the meaning of the vector Ftk−1.

Obviously, the problem (3) – (7) is not convex. To solve it, two classes of algorithms are suggested. The first one is based on separation of variables according to their power in the objective function. The second one is the sequential quadratic programming (SQP) algorithm. 2.1 2.1.1

Variables Separation Algorithms Algorithm 1.1

The problem is solved by a consecutive minimization of the objective function over the group of variables gtikj that have a power of 1 in the objective function and Fti that have a power of 2 afterwards. A. Put Y = ∞ B. Get starting values for Fti and gtikj C. Solve the the problem (3) – (7) for the variables Fti, i = 1, . . . , lt. (It is a quadratic problem with linear constraints).

D. If the absolute value of the difference between Y and the objective function value is less than ε, then End. E. Fix the obtained values of the variables Fti, i = 1, . . . , lt.

F. Save the value of the objective function to the variable Y .

G. Solve the problem (3) – (7) for the variables gtikj, k = 1, . . . , lt−1, i = 1, . . . , lt, j = 1, . . . , Lt. (It is a linear problem).

H. Fix the obtained values of the variables gtikj, k = 1, . . . , lt−1, i = 1, . . . , lt, j = 1, . . . , Lt.

I. Go to the item C. 2.1.2

Algorithm 1.2

The following algorithm is based on the representation of the problem (3) – (7) in a form of a parametric linear problem

The idea of the algorithm is to consecutively update vector of parameters F based on the problem dependency on this parameter. Along with this it is assumed that linear problems, presented below are feasible in a neighborhood of the sought-for solution.

max(−c(F ), g)

g g ≥ 0, A(F )g = b A. Set the tolerance parameters ε, βmin > 0 for a stopping criteria.

(g(n), F (n)) ∈ X. Set the initial shift value β(n).

Put n = 0, get starting values B. For fixed F (n) solve the problem (3) – (7) as a linear problem over variables g. Fix f0∗(F (n)) as an optimal value of the objective function, and its corresponding optimal solution g(n + 1), defined by the set F (n).

C. Define G(n) from the following rule:

lt−1 Lt Git;m(1) = max(0, Fti;m(n) − β(n) ∑ ∑ gtikj(n + 1)(Fti;m(n) − ftjk;m)), i = 1, 2, . . . , lt, m = 1, 2, . . . , nx.

k=1 j=1 D. Solve the problem (3) – (7) over g for the fixed set of variables G(n) instead of F (n). Fix f0∗(G(n)) as an optimal value of the objective function.

E. If f0∗(G(n)) − f0∗(F (n)) < −ε, then put F (n + 1) = G(n), n = n + 1. Go to the item B. F. If β(n) > βmin, then set β(n + 1) = β(n) \ 2, F (n + 1) = F (n), n = n + 1. Go to the item B.

G. (g(n), F (n)) is the problem solution. The issue of the feasibility of linear problems specified in Algorithm 1.2 can be reduced to the issue of the stability of the linear program solution g∗, where (g∗, F ∗) is an optimal solution of the problem (3)–(7). The following lemma takes place

Lemma 2: The problem (8) – (9) is stable in its solution g∗, if g∗ > 0, (g∗, F ∗) is the solution of the problem (3) – (7), in which vectors Fti are linearly independent.

Proof: The stability of the linear program in the solution g∗ is equivalent to the existence of such solutions in a feasible region of primal and dual linear programs, in which all inequality constraints are strict and fulfilled, provided that constraints matrix has a full rank. For a primal program such solution is g∗. The existence of such solution for a dual program is obvious. From a linear independency of vectors Fti if follows that the constraints matrix has a full rank. 2.2

SQP Algorithm

The Algorithm described hereinafter belongs to the class of SQP algorithms. It implies solving an auxiliary quadratic program at each stage when choosing a descent direction. This class of algorithms is implemented for Optimization and Variational Problems (look, for example [Izmailov & Solodov, 2014], [Gould & Robinson, 2010 (1)] and [Gould & Robinson, 2010 (2)]. 2.2.1

Algorithm 2

Denote by x a pair of (g, F ), by q(x) – expression ∑lt i=1 Fti ∑jL=t 1 gtikj − Ftk−1.

i=1 gtikj − 1, by r(x) – expression L1t ∑lt The problem (3) – (7) can be equivalently rewritten

[ min (f0′ (xn, sn) + sn Here x ∈ Rlt−1·lt·Lt+nx·lt , q(x) ∈ Rlt−1·Lt , r(x) ∈ Rnx·lt−1

The problem (10) – (11) has a large dimensionality, the objective function and its constraints are not convex. Besides that, constraints gradients are not linearly independent. The problem contains equation constraints as well as equality constraints. SQP-algorithms for problems with equality constraints in which constraints gradients are not linearly independent are presented in [Izmailov & Uskov, 2017].

The problem (10) – (11) can be solved iteratively by choosing the vector of descent sn = (sgn, snF ) on the step n, sgn ∈ Rlt−1·lt·Lt , snF ∈ Rnx·lt which optimizes the following quadratic problem: (10) (11) (12) q(xn) + (q′(xn), sn) = 0, r(xn) + (r′(xn), sn) = 0, xn + sn ≥ 0 where D is a positive-definite matrix.

The problem (12) – (13) is feasible under some assumptions. In particular, constraints gradients should be linearly independent.

For the particular case the problem (12) – (13) can be rewritten as follows

 msnin 0.5 ∑lt1 ∑lt ∑Lt sgnikj ||Ftin − ftjk||2 + ∑lt ∑nx snF mi ∑lt1 ∑lt (Ftin;m − ftjk;m)gtinjk + 12 k=1 i=1 j=1 i=1 m=1 k=1 i=1  (Dsn, sn) gtinjk + sgnikj ≥ 0, k = 1, . . . , lt−1, i = 1, . . . , lt, j = 1, . . . , Lt

Ftin;m + snF mi ≥ 0, i = 1, . . . , lt, m = 1, . . . , nx lt ∑ sgnikj = 1, k = 1, . . . , lt−1, j = 1, . . . , Lt i=1 1 ∑Lt gtinjk Lt j=1 lt ∑ Ftin;m − Ftk−1n;m + i=1 1 ∑Lt ∑lt sgnikj Ftin;m + Lt j=1 i=1 1 lt Lt

∑ snF mi ∑ gtinjk = 0, k = 1, . . . , lt−1, m = 1, . . . , nx Lt i=1 j=1 (18) As mentioned above,the matrix D should be positive-definite. For example, D can be an identity matrix of a corresponding dimension (lt−1 · lt · Lt + nx · lt × lt−1 · lt · Lt + nx · lt) or, alternatively, this can be a matrix of second-order derivatives on each step:

D =

A. Get a starting vector x0 = (g0, F0).

B. Solve the problem (12) – (13) and obtain the vectors sn = (sgn, snF ) and yn = (yqn, yrn) of optimal and dual solutions correspondingly, where n is the index of the current step (13) (14) (15) (16) (17) C. Until φ(xn + βsn+1) ≤ φ(xn) + 12 β ((f0′ (xn), sn+1) − Gψ(xn)) do β := 12 β, where φ(x) = f0(x) + Gψ(x), ψ(x) = ∑nx·lt−1 |ri(x)| + ∑min(0, x),

i=1 G = maxi |yni| + a, a > 0 is a penalty parameter, min(0, x) is a component-wise minimum of vector x.

D. If β is less than ε, then End.

E. Obtain new values of vector x as xn+1 := xn + βsn+1

F. Go to the item B

2.3

Scenario Lattice Construction Algorithm

The algorithm of the scenario lattice constructing is the following. We suppose that the principal component analysis has been produced and all selected principal components are presented by the according ARIMAGARCH models. Denote by vtjk the vector of the parameters of the ARIMA-GARCH model with the vector of forward prices ftjk. Further we will address the vectors vtjk and ftjk as “a k, j - pattern at the stage t”. For the construction of this algorithm put l0 = 1 also. A. Assign the current values of the forward prices to the vector F11 and the current parameters of ARIMA1

GARCH models to the vector v1;1 which form 1,1-pattern at the stage 1 of the scenario lattice. B. Construct the degenerated probability distribution of the patterns at the first stage ptn−1;m = 1, m = 1, n = 1. C. t := 2 D. k := 1 E. For j = 1, . . . , Lt Do:

Select one from the vectors vn

t−1;m, m = 1, . . . , lt−2, n = 1, . . . , Lt−1 per the probability distribution ptn−1;m, m = 1, . . . , lt−2, n = 1, . . . , Lt−1. Using this vector generate one k, j-pattern for the stage t by Monte-Carlo method.

F. If k < lt−1 then k := k + 1 and go to the item E.

G. Solve problem (3) – (7) using Algorithms presented above. Assign forward and futures prices Fti, i = 1, , lt to the according nodes of the stage t.

H. Calculate transition probabilities ρitk, k = 1, . . . , lt−1, i = 1, . . . , lt using formula (2).

I. If t < T then t := t + 1, else End.

J. Construct the distribution pt−1;m = 3

Empirical Results

∑lkt=−12 gtm−n1;k , m = 1, . . . , lt−1, n = 1, . . . , Lt−1. Go to the item D.

lt−1×Lt−1

As an empirical example, the joint dynamic of 5 variables was examined. The first variable represents a spot exchange rate for the currencies USD and RUB, while others are interest rates of different maturities in Russia and the United States. Principal component analysis (PCA) was applied to these variables to construct 5 independent time series. After that ARIMA-GARCH model parameters were estimated for each time series and all principal components were simulated. This allowed to produce, after using the factor loadings matrix, the set of Monte-Carlo realizations of the original variables. Then the scenario lattice was constructed using Algorithms presented in Section 2. The results of the study are presented in the table 1.

As it can be seen from the presented table, the SQP algorithm despite being the most time-efficient in all cases gives the worst approximation of optimal solution. Among algorithms that use variables separation the one based on consecutive optimization over each group of variables is preferable. It should be also mentioned, that after completing of both Algorithm 1.1 and Algorithm 1.2 its solutions were utilized as a starting value for SQP algorithm. The latter one has not improved the value of objective function which means that obtained solutions represent stationary points of the objective function.

[Pereira & Pinto , 1991] Pereira, M.V.F. & Pinto , L.M.V.G. ( 1991 ). Multi-stage stochastic optimization applied to energy planning . Mathematical programming , 52 ( 1-3 ), 359 - 375 .

[Heitsch & Romisch , 2003] Heitsch, H. & Romisch , W. ( 2003 ). Scenario Reduction Algorithms in Stochastic Programming . Computational Optimization and Applications , 24 ( 2-3 ), 187 - 206 .

[Consiglio et al., 2014 ] Consiglio, Andrea & Carollo, Angelo & Zenios, Stavros A. ( 2014 ). Generating Multi-factor Arbitrage-Free Scenario Trees with Global Optimization . Working Papers of The Wharton Financial Institutions Center , 13 - 35 .

[Geyer et al., 2010 ] Geyer, Alois, Hanke, Michael & Weissensteiner, Alex, ( 2010 ). No-arbitrage conditions, scenario trees, and multi-asset financial optimization . European Journal of Operational Research , 206 ( 3 ), 609 - 613 .

[Izmailov & Solodov , 2014] Izmailov

A.F.

, Solodov

M.V.

( 2014 ). Newton-Type Methods for Optimization and Variational Problems . Switzerland: Springer.

[Evtushenko & Posypkin , 2013] Evtushenko, Y. , Posypkin , M. ( 2013 ). A deterministic approach to global boxconstrained optimization . Optimization Letters , 7 ( 4 ), 819 - 829 .

[Khamisov , 1999] Khamisov, O. ( 1999 ). On Optimization Properties of Functions, with a Concave Minorant . Journal of Global Optimization , 14 ( 1 ), 79 - 101 .

[Strongin & Sergeyev , 2000] Strongin, R.G. , Sergeyev , Y.D. ( 2000 ). Global Optimization with Non-Convex Constraints. Sequential and Parallel Algorithms .. New York: Springer.

[Gould & Robinson , 2010 (1)] Gould, N.I.M. , Robinson , D.P. ( 2010 ). A Second Derivative SQP Method: Global Convergence . SIAM Journal on Optimization , 20 ( 4 ), 2023 - 2048 .

[Gould & Robinson , 2010 (2)] Gould, N.I.M. , Robinson , D.P. ( 2010 ). A Second Derivative SQP Method: Local Convergence and Practical Issues . SIAM Journal on Optimization , 20 ( 4 ), 2049 - 2079 .

[Izmailov & Uskov , 2017] Izmailov

A.F.

& Uskov

E.I.

( 2017 ). Subspace-stabilized sequential quadratic programming . Computational Optimization and Applications , Springer, 67 ( 1 ), 129 - 154 .