Data Science REDUCING THE SAMPLE SIZE WHEN ESTIMATING CONDITIONAL QUANTILES S. Shatskikh1 , L. Melkumova2 1 Samara National Research University, Samara, Russia 2 Mercury Development Russia, Samara, Russia Abstract. In this paper we propose an idea to use a certain property of multivariate probability distributions, that we call the conditional quan- tile reproducibility, to decrease the amount of observations required to construct a statistical estimate of a n-dimensional conditional quantile of the distribution. For the class of probability distributions, satisfying to this property, we present several results, proving that in many cases the reproducibility property allows us to restore the n-dimensional con- ditional quantile by solving a certain type of Pfaffian differential equa- tion. The equation is constructed from functions, derived only from the 2-dimensional marginal distributions of the initial distribution. Keywords: multivariate conditional quantiles, quantile reproducibility, Pfaffian quantile differential equation, conditional quantile estimation Citation: Shatskikh S, Melkumova L. Reducing the Sample Size When Estimating Conditional Quantiles. CEUR Workshop Proceedings, 2016; 1638: 769-781. DOI: 10.18287/1613-0073-2016-1638-769-781 1 Introduction Many of the widely used multivariate probability distributions satisfy to a cer- tain property of the multivariate conditional quantiles which we call “quantile reproducibility”. Among them are the multivariate Gaussian distribution, Stu- dent’s distribution, Logistic distribution, Pareto distribution, Gamma distribu- tion, Clayton’s Copula distribution and others (see [9, 8, 6, 5]). This article gives several results related to the quantile reproducibility property of the multivariate distributions. Particularly we prove that when the distribu- tion satisfies to the quantile reproducibility property (we also say it “has repro- ducible conditional quantiles”), the solution of a Pfaffian differential equation of certain form, that can be constructed by only knowing 2-dimensional marginal distributions of the original distribution, is equal to a distribution’s multivariate conditional quantile. In other words by solving the Pfaffian differential equa- tion we build a multivariate quantile from bivariate functions, describing the probability distribution. The outline of this article is as follows. In Section 2 we introduce the multivariate conditional quantiles and discuss some of their applications. In Section 3 we give Information Technology and Nanotechnology (ITNT-2016) 769 Data Science Shatskikh S., Melkumova L... the definition of the quantile reproducibility for multivariate distributions and talk on its geometrical interpretation. In Section 4 we introduce the Pfaffian differential equation for the distribution which we refer to as the “quantile equation”. We also show that the solution of the maximum possible dimension for the quantile equation of a distribution with reproducible quantiles equals a conditional quantile of the maximum di- mension. Section 5 gives several concrete examples of multivariate distributions with reproducible quantiles along with the corresponding quantile equations. In Section 6 we give an intermediate version of the quantile reproducibility property and prove that, when it is satisfied, one can find the solutions of the quantile equation of intermediate dimensions. In Section 7 we illustrate this theorem by giving an example of the distribution with this version of quantile reproducibility and finding the solutions of the quantile equation for it. In Section 8 we discuss how the quantile reprodicibilty property could be used in statistical estimation. For the case when it is known that the distribution has reproducible conditional quantiles, we propose a technique to build the multivariate quantile estimate by only using bivariate observations. We also show that this technique allows us to reduce the required number of observations when compared to a traditional approach to multivariate quantile estimation. 2 Conditional Quantiles and Their Applications First, we will give the definition of the multivariate conditional quantile. Con- sider a system of random variables X1 , X2 , . . . , Xn . Suppose we have the conditional cumulative distribution function Fi|1...î...n (xi |x1 , . . . xbi , . . . , xn ) (the sign ˆ· over the element · means that the element is omitted), which is contin- uous and increases monotonically in xi for any fixed vector (x1 , . . . xbi , . . . , xn ) ∈ Rn−1 . (p) The conditional quantile qi|1...î...n (x1 , . . . xbi , . . . , xn ) of level p ∈ [0, 1] for a ran- dom variable Xi given X1 , . . . , X ci , . . . , Xn is defined by the equation (p) Fi|1...î...n (qi|1...î...n (x1 , . . . xbi , . . . , xn )| x1 , . . . xbi , . . . , xn ) ≡ p for any (x1 , . . . xbi , . . . , xn ) ∈ Rn−1 . The conditional median mi|1...î...n (x1 , . . . xbi , . . . , xn ) for a random variable Xi given X1 , . . . , X ci , . . . , Xn is a conditional quantile of level p = 1 . 2 Another way to define a conditional quantile is to give a certain point x ◦ = (x◦1 , . . . , x◦n ) ∈ Rn , lying on the quantile surface:  ◦  (x ) F1| 2...n q1|2...n (x2 , . . . , xn )| x2 , . . . , xn ≡ F1| 2...n (x◦1 | x◦2 , . . . , x◦n ). The conditional quantile is then said to be going through x ◦ . In this case the level of the quantile is given by p = F1| 2...n (x◦1 | x◦2 , . . . , x◦n ). One of the several applications of the conditional quantiles is using them as random variable estimates. Consider the following situation. We have a set of random variables X1 , X2 , . . . , Xn . We observe the first n − 1 of them, and Information Technology and Nanotechnology (ITNT-2016) 770 Data Science Shatskikh S., Melkumova L... the last one needs to be estimated. We also know the conditional distribution function Fn|1...n−1 (xn | x1 , . . . , xn−1 ) which is supposed to be monotonic in xn . Now, having defined a loss function  (τ − 1)u, u ≤ 0, ρτ (u) = τ ∈ (0, 1). τ u, u > 0. we want to get the estimate for the last random variable Xn , so that it would minimize the conditional loss. fb(x1 , . . . , xn−1 ) : min M{ρτ (Xn − f (x1 , . . . , xn−1 )) | x1 , . . . , xn−1 } f (·) Rather simple calculations will lead us to the following formula for the Xn estimate. Fn|1...n−1 (fˆ(x1 , . . . , xn−1 ) | x1 , . . . , xn−1 ) ≡ τ That is any value to satisfy to this equation would minimize the loss function. As long as the F is monotonic in xn , there is only one such value, which is the conditional quantile −1 (τ ) fb(x1 , . . . , xn−1 ) = Fn|1...n−1 (τ | x1 , . . . , xn−1 ) = qn|1...n−1 (x1 , . . . , xn−1 ). Further on, it is also known (see [10]), that the estimate, minimizing the condi- tional loss, will also minimize the expected loss (the bayesian risk): M{ρτ (Xn − fb(X1 , . . . , Xn−1 ))} = min M{ρτ (Xn − f (X1 , . . . , Xn−1 )) f (·) 3 Reproducible Conditional Quantiles Consider a vector of random variables X = (X1 , . . . , Xn ) with the cumulative distribution function F1...n (x1 , . . . , xn ), the multivariate strictly positive density f1...n (x1 , . . . , xn ) > 0, ∀(x1 , . . . , xn ) ∈ Rn , and the conditional distribution functions P{X1 ≤ x1 | X2 = x2 , . . . , Xn = xn } = F1|2...n (x1 | x2 , . . . , xn ), P{Xi ≤ xi | Xj = xj } = Fi|j (xi | xj ), i 6= j, i, j = 1, n. By fixing a point x ◦ = (x◦1 , . . . , x◦n ) ∈ Rn , we get a family of conditional quan- tiles, going through the selected point, which act as level surfaces (or curves) of the conditional CDF:  ◦  (x ) F1|2...n q1|2...n (x2 , . . . , xn ) | x2 , . . . , xn ≡ F1|2...n (x◦1 | x◦2 , . . . , x◦n ), (x ◦ ) q1|2...n (x◦2 , . . . , x◦n ) = x◦1 ,  (x◦ ,x◦ )  (x◦ ,x◦ j) Fi|j qi|ji j (xj ) | xj ≡ Fi|j (x◦i | x◦j ), qi|ji (x◦j ) = x◦i , i 6= j. Information Technology and Nanotechnology (ITNT-2016) 771 Data Science Shatskikh S., Melkumova L... Definition 1. (see [8]) We say that the multivariate probability distribution F1...n (x1 , . . . , xn ) has reproducible conditional quantiles if the system of identi- ties (x◦ ) (x◦ ,x◦ ) (x◦ ,x◦ ) (x◦ ,x◦ )   q1|2...n x2 , q3|23 2 (x2 ), . . . , qn|2n 2 (x2 ) ≡ q1|21 2 (x2 ), ...................................................... (1) (x◦ ◦  ◦ ◦ (x◦ ) (x◦ ,x◦ )  (x ,x ) n−1 ,xn ) q1|2...n q2|n2 n (xn ), . . . , qn−1|n (xn ), xn ≡ q1|n1 n (xn ) holds for any given point x◦ =(x◦1 , . . . , x◦n ) ∈ Rn . For a geometric interpretation, consider the curves, parametrized by the “small” conditional quantiles (k = 2, n). (x◦ ,x◦ k) (x◦ ,x◦ ) (x◦ ,x◦ ) (x◦ ,x◦ k) γk (x ◦ , t) = {q1|k1 k−1 k (t), . . . , qk−1|k k+1 k (t), t, qk+1|k (t), . . . , qn|kn (t)}, The quantile reproducibility would mean that these curves lie on the “big” quantile surface: ◦ (x ◦ ) Γ(x ) (x2 , . . . , xn ) = {q1|2...n (x2 , . . . , xn ), x2 , . . . , xn }. 4 Quantile Pfaffian Differential Equations and Their Re- lation to Quantile Reproducibility It can be shown that for the class of multivariate probability distributions with reproducible conditional quantiles we can construct the “big” (n − 1)-variate conditional quantile as the solution of a Pfaffian differential equation of special form. The equation itself is based on the functions, derived from the conditional quantiles of dimension 1, corresponding to the 2-dimensional marginal distribu- tions of the initial distribution. In other words, using the property of quantile reproducibility, we can virtually shift from bivariate functions, characterizing the probability distribution, to its multivariate characteristic. Again we consider a random vector X = (X1 , . . . , Xn ) with cumulative distri- bution function F1...n (x1 , . . . , xn ) and strictly positive density f1...n (x1 , . . . , xn ) > 0, ∀(x1 , . . . , xn ) ∈ Rn . Let us introduce the determinant e1 e2 e3 ... en (x◦ ◦ 1 ,x2 ) (x◦ ◦ 3 ,x2 ) (x◦ ◦ n ,x2 ) q̇1|2 (x◦2 ) 1 q̇3|2 ◦ (x2 ) . . . q̇n|2 (x◦2 ) W(x◦ ) = . ............................ (x◦ ,x◦ ) (x◦ ,x◦ ) (x◦ ,x◦ ) q̇1|n1 n (x◦n ) q̇2|n2 n (x◦n ) q̇3|n3 n (x◦n ) . . . 1 We denote by ei the basis vectors in Rn . The point over the quantile means dif- (x◦ ,x◦ ) ferentiation, that is q̇i|ji j (x◦j ) is the derivative of the one-dimensional quantile (x◦ ,x◦ ) qi|ji j (xj ) with respect to xj , going through the point (x◦i , x◦j ) and taken at xj = x◦j . Information Technology and Nanotechnology (ITNT-2016) 772 Data Science Shatskikh S., Melkumova L... To simplify the notation let us expand the determinant along the first row n X W(x◦ ) = A1k (x◦1 , . . . , x◦n ) ek . k=1 (x◦ ,x◦ ) Each of the cofactors A1k will depend on a set of quantile derivatives q̇i|ji j (x◦j ). Now we replace the point x◦ = (x◦1 , . . . , x◦n ) with x = (x1 , . . . , xn ), which is variable in Rn , and consider the differential 1-form n X ω= A1k (x1 , . . . , xn )dxk . k=1 Next we construct a Pfaffian differential equation for the form n X ω= A1k (x1 , . . . , xn )dxk = 0. k=1 We call it the quantile equation. Theorem 1. If the probability distribution F1...n (x1 , . . . , xn ) with a joint PDF positive on Rn has reproducible conditional quantiles (1), and A11 (x1 , . . . , xn ) 6= 0, then the quantile equation n X ω= A1i (x1 , . . . , xn )dxi = 0 (2) i=1 is completely integrable. The solution of (2) going through the given point x◦ is (x◦ ) the “big” conditional quantile x1 = q1|2...n (x2 , . . . , xn ). The proof of this theorem is given in [6]. A well known result, the Frobenius theorem (see [1] p. 97), gives a necessary and sufficient condition of the complete integrability of the Pfaffian differential equation. It states that the equation (2) is completely integrable if and only if dω ∧ ω = 0, (3) where dω is the exterior differential of the differential 1-form ω and ∧ means the exterior product of the two differential forms. 5 Examples Many commonly used multivariate distributions have reproducible conditional quantiles. Some of the examples are multivariate Gaussian distribution, mul- tivariate Gamma distribution, multivariate Student distribution, multivariate Logistic distribution, multivariate Pareto distribution, Clayton copula (see [4]). Here to illustrate our results we present only a few specific distributions along with their quantile equations. All of the equations can be solved using the Information Technology and Nanotechnology (ITNT-2016) 773 Data Science Shatskikh S., Melkumova L... well-known elementary methods (see for example [7]). Tabl.1. Quantile equations for some distributions Densities Quantile equations n 1. φ(x; m, [σ ij ]) σ 1k dxk = 0 P k=1 1 2. π2 (1+x2 +x2 +x2 )2 (1 + x22 + x23 )dx1 − x1 x2 dx2 − x1 x3 dx3 = 0 1 2 3 6 e−(x1 +x2 +x3 ) 3. (ex2 + ex3 + ex2 +x3 )dx1 − ex3 dx2 − ex2 dx3 = 0 (1+e−x1 +e−x2 +e−x3 )4 6 4. (x1 +x2 +x 3 −2) 4 (x2 + x3 − 1)dx1 + (1 − x1 )dx2 + (1 − x1 )dx3 = 0 xi ≥ 1, i = 1, 2, 3 1–Gaussian distribution, 2–Cauchy distribution, 3–Logistic distribution, 4–Pareto distribution 6 The Darboux Class for Quantile Differential Equations and Its Relation to Reproducibility Now what if the quantile differential equation is not completely integrable? Are there cases when we can say something about the solutions of this equation? To answer this question, let us first remind the reader about one of the character- istics of the differential 1-forms (and as a consequence of the Pfaffian differential equations) - the Darboux class. As the Darboux theorem (see [3]) states, the class gives the maximum dimension of the integral manifold of the corresponding Pfaffian differential equation. Definition 2. If the differential 1-form ω satisfies the equality ω ∧ (dω)r 6= 0, but ω ∧ (dω)r+1 = 0, then we say that the Darboux class of the differential form equals 2r + 1. As we have already mentioned (see section 4), the equality dω ∧ ω = 0 (4) gives the criterion for complete integrability of a Pfaffian equation (Frobenius theorem). In this case r = 0, so the Darboux class of the differential form ω is equal to 1. Theorem (Darboux’s theorem). If the Darboux class of the differential 1-form 1 ω equals 2r + 1, then by a smooth local change of coordinates the Pfaffian equation n X ω= ak (x1 , ..., xn )dxk = 0 k=1 1 with coefficients a (x , ..., x ) not equal to zero simultaneously k 1 n Information Technology and Nanotechnology (ITNT-2016) 774 Data Science Shatskikh S., Melkumova L... can be converted to the canonical form dy1 + y2 dy3 + . . . + y2r dy2r+1 = 0. In this case the Pfaffian equation has an integral manifold of maximum dimen- sion n − r − 1 with the following first integrals: y1 (x1 , ..., xn ) = C1 = const, y3 (x1 , ..., xn ) = C3 = const, . . . , y2r+1 (x1 , ..., xn ) = C2r+1 = const. As noted above, if the condition (4) holds, then r = 0. So, according to the Darboux’s theorem, the maximum dimension of the integral manifold must equal n − 1, which means the complete integrability of the equation. This agrees with the Frobenius theorem given earlier. Let us now sum up. For the given quantile equation we can calculate the Darboux class of the corresponding differential 1-form. From this value we obtain the maximum dimension of the integral manifold. But still what is the integral manifold itself? We will now show, that for multivariate probability distributions with certain type of quantile reproducibility, the solutions can be given explicitly. To do this, we will first establish one useful property of matrix determinants (see [5]). Let us consider the determinant of order n, where n − 1 ≥ k, of the form2 dx1 dx2 ... dxk dxk+1 ... dxn−1 dxn 1 q̇2|1 ... q̇k|1 q̇k+1|1 ... q̇n−1|1 q̇n|1 ... ... ... ... ... ... ... ... M= q̇1|k q̇2|k ... 1 q̇k+1|k ... q̇n−1|k q̇n|k . q̇1|k+1 q̇2|k+1 . . . q̇k|k+1 1 ... q̇n−1|k+1 q̇n|k+1 ... ... ... ... ... ... ... ... q̇1|n−1 q̇2|n−1 . . . q̇k|n−1 q̇k+1|n−1 ... 1 q̇n|n−1 We will denote the cofactor of the element dxi in the first row of M by A(dxi ). Lemma 1. The following expansion for the determinant M is true: dx1 dx2 . . . dxk dxk+1 dx1 dx2 . . . dxk dxn A(dxk+1 ) 1 q̇2|1 . . . q̇k|1 q̇k+1|1 A(dxn ) 1 q̇2|1 . . . q̇k|1 q̇n|1 + ... + , S ... ... ... ... ... S ... ... ... ... ... q̇1|k q̇2|k ... 1 q̇k+1|k q̇1|k q̇2|k ... 1 q̇n|k (5) where 1 q̇2|1 . . . q̇k|1 S = ... ... ... ... . q̇1|k q̇2|k ... 1 The proof of this lemma is given in [5]. 2 The given expansion holds for determinants of general form. Information Technology and Nanotechnology (ITNT-2016) 775 Data Science Shatskikh S., Melkumova L... Suppose x0 := (x01 , . . . , x0n ). For all the natural numbers s = 1, n − 1 and i = s + 1, n we will denote: (x0 ) (x0 ,...,x0s ,x0i ) qi|1...s (x1 , . . . , xs ) = qi|1...s 1 (x1 , . . . , xs ). Next, for a random vector X = (X1 , . . . , Xn ) we will suppose that k < n − 1 of its variables are fixed. Without loss of generality we can think that these are the first k variables x1 , . . . , xk . Theorem 2. If the probability distribution F1...n (x1 , ..., xn ) has reproducible k-dimensional conditional quantiles, that is (x0 ) (x0 ) (x0 ) (x0 ) (x0 ) qi|1...k (x1 , q2|1 (x1 ), q3|1 (x1 ), . . . , qk|1 (x1 )) = qi|1 (x1 ) (x0 ) (x0 ) (x0 ) (x0 ) (x0 ) qi|1...k (q1|2 (x2 ), x2 , q3|2 (x2 ), . . . , qk|2 (x2 )) = qi|2 (x2 ) (6) ...................................................... (x0 ) (x0 ) (x0 ) (x0 ) (x0 ) qi|1...k (q1|k (xk ), q2|k (xk ), . . . , qk−1|k (xk ), xk ) = qi|k (xk ) and the determinant (x0 ) (x0 ) 1 q̇2|1 (x1 ) . . . q̇k|1 (x1 ) S= ... ... ... ... 6= 0, (7) (x0 ) (x0 ) q̇1|k (xk ) q̇2|k (xk ) ... 1 then the surface (x0 ) Γk (x1 , . . . , xk ) = n (x0 ) (x0 ) o (8) = x1 , . . . , xk , qk+1|1...k (x1 , . . . , xk ), . . . , qn|1...k (x1 , . . . , xk ) , constructed from the k-dimensional quantiles, is a k-dimensional solution of the quantile equation (2). Proof. Let us limit our consideration to the (k + 1)-dimensional marginal prob- ability distribution F1...ki (x1 , . . . , xk , xi ). Then the k-dimensional conditional quantile (x0 ) qi|1...k (x1 , . . . , xk ), i = k + 1, n, (9) will be the “big” quantile of the probability distribution with the corresponding conditional distribution function Fi|1...k (xi | x1 , . . . , xk ). So this distribution has a reproducible “big” conditional quantile. From the condition (7) and theorem 1 we conclude that for this distribution the “big” conditional quantile (9) is the solution of the Pfaffian differential equation: dx1 dx2 ... dxk dxi (x0 ) (x 0 ) (x0 ) 1 q̇2|1 (x1 ) ... q̇k|1 (x1 ) q̇i|1 (x1 ) wi (x1 , . . . , xk , xi ) = = 0, ... ... ... ... ... 0 0 0 (x ) (x ) (x ) q̇1|k (xk ) q̇2|k (xk ) . . . 1 q̇i|k (xk ) Information Technology and Nanotechnology (ITNT-2016) 776 Data Science Shatskikh S., Melkumova L... that is (x0 ) wi (x1 , . . . , xk , qi|...k (x1 , . . . , xk )) ≡ 0, i = k + 1, n. (10) Let us now consider the quantile equation for the initial distribution. w(x1 , . . . , xn ) = dx1 ... dxk ... dxn−1 dxn (x 0 ) (x 0 ) (x 0 ) 1 ... q̇k|1 (x1 ) ... q̇n−1|1 (x1 ) q̇n|1 (x1 ) = = 0. ... ... ... ... ... ... 0 0 0 (x ) (x ) (x ) q̇1|n−1 (xn−1 ) . . . q̇i|n−1 (xn−1 ) . . . 1 q̇n|n−1 (xn−1 ) We can apply Lemma 1 to expand the left part of the equation: w(x1 , . . . , xn ) = A(dxk+1 ) A(dxn ) = · wk+1 (x1 , . . . , xk , xk+1 ) + . . . + · wn (x1 , . . . , xk , xn ). S S Now, using (10), we get: (x0 ) (x0 ) w(x1 , . . . , xk , qk+1|1...k (x1 , . . . , xk ), . . . , qn|1...k (x1 , . . . , xk )) ≡ 0. Therefore, the surface (8) is an integral manifold for the initial quantile equation (2). Note 1. If the conditions of the theorem are satisfied, the quantile equation (2) has a solution of dimension k. Therefore the maximum possible integral manifold dimension for the equation is not less than k. And, consequently, the Darboux class of the 1-form ω is less or equal to 2(n − k) − 1. When the Darboux class of the quantile equation is equal to 2(n − k) − 1, the surface (8) is the integral manifold of (2) of maximum possible dimension, going through the point x0 . Note 2. If we add to (6) the following condition (x 0 ) (x 0 ) (x 0 ) qn|1...n−1 (x1 , . . . , xk , qk+1|1...k (x1 , . . . , xk ), . . . , qn−1|1...k (x1 , . . . , xk )) ≡ (x 0 ) ≡ qn|1...k (x1 , . . . , xk ), then the integral manifold (8) takes the form: (x 0 ) (x 0 ) n x1 , . . . , xk , qk+1|1...k (x1 , . . . , xk ), . . . , qn−1|1...k (x1 , . . . , xk ), (x 0 ) (x 0 ) (x 0 )  o qn|1...n−1 x1 , . . . , xk , qk+1|1...k (x1 , . . . , xk ), . . . , qn−1|1...k (x1 , . . . , xk ) and it is a part of the “big” conditional quantile surface (x 0 ) n o x1 , x2 , . . . , xn−1 , qn|1...n−1 (x1 , . . . , xn−1 ) . Information Technology and Nanotechnology (ITNT-2016) 777 Data Science Shatskikh S., Melkumova L... 7 Example of the Distribution with an Intermediate Dar- boux Class To illustrate theorem 2, let us consider a mixture of two 5-dimensional Cauchy distributions with density 1 (1) 2 f (x1 , x2 , x3 , x4 , x5 ) = c (x1 , x2 , x3 , x4 , x5 ) + c(2) (x1 , x2 , x3 , x4 , x5 ) , 3 3 where √ (1) 12 15 c =  , 9 2 3 π 3 1 + x21 + 3x22 + 4x33 + 45 2 2 x4 − 15x4 x5 + 2 x5 √ (2) 40 3 c = 3 . π (1 + x1 2 + 3x2 2 + 4x3 2 + 50x4 2 + 40x4 x5 + 10x5 2 ) 3 We have the following quantile equation dx1 dx2 dx3 dx4 dx5 x1x2 x1x3 x1x4 x1x5 1 1+(x1)2 1+(x1)2 1+(x1)2 1+(x1)2 3x1x2 3x2x3 3x2x4 3x2x5 ω= 1+3(x2)2 1 1+3(x2)2 1+3(x2)2 1+3(x2)2 = 0, (11) 4x1x3 4x2x3 4x3x4 4x3x5 1+4(x3)2 1+4(x3)2 1 1+4(x3)2 1+4(x3)2 10x1x4 10x2x4 10x3x4 (x ,x ) 1+10(x4)2 1+10(x4)2 1+10(x4)2 1 q̇5|44 5 (x4 ) where (x ,x5 ) q̇5|44 (x4 ) = A/B;  √ 3/2 A = (1 + 10x24 )−1 · 2 10 (−1 + 5x4 x5 ) 2 + 45x24 − 30x4 x5 + 9x25 + 3/2  +5 (1 + 6x4 x5 ) 1 + 10 5x24 + 4x4 x5 + x25 ; √ 3/2 3/2 B= 10 2 + 45x24 − 30x4 x5 + 9x25 + 3 1 + 10 5x24 + 4x4 x5 + x25 . Now let us calculate the Darboux class of the form ω to determine the maximum dimension of the solution of the quantile equation. The calculations show that dω 6= 0; dω ∧ ω 6= 0 almost surely in R5 ; dω ∧ dω ≡ 0. So the Darboux class of the form ω is almost surely equal to 3 and the maximum dimension of the solution of the quantile equation (11) is also almost surely equal to 3. The integral manifold of the maximum possible dimension, going through the point x ◦ = (x◦1 , . . . , x◦5 ) is given by the equalities s s 1 + x 2 + 3x2 + 4x2 1 + x21 + 3x22 + 4x23 x4 = x◦4 1 2 3 ; x 5 = x ◦ 5 , 1 + (x◦1 )2 + 3(x◦2 )2 + 4(x◦3 )2 1 + (x◦1 )2 + 3(x◦2 )2 + 4(x◦3 )2 which exactly match the two 3-dimensional conditional quantiles going through (x ◦ ) (x ◦ ) x ◦ : q̇4|123 (x1 , x2 ) and q̇5|123 (x1 , x2 ). That is the solution is (x ◦ ) (x ◦ ) n o S (x ◦ ) = x1 , x2 , x3 , q4|123 (x1 , x2 , x3 ) , q5|123 (x1 , x2 , x3 ) . Information Technology and Nanotechnology (ITNT-2016) 778 Data Science Shatskikh S., Melkumova L... Finally it is easy to verify that the 3-dimensional conditional quantiles satisfy to the quantile reproducibility property. So, according to theorem 2, S (x ◦ ) should be the solution of the quantile equation. Since the Darboux class of the form ω is equal to 3, then, as it is stated in note 1, this is the solution of the maximum possible dimension. It is also easy to show, that the condition of note 2 is satisfied, so S (x ◦ ) is a part of the “big” conditional quantile surface (x◦ ) n o S (x ◦ ) ⊆ x1 , x2 , x3 , x4 , q5|1234 (x1 , x2 , x3 , x4 ) . 8 Statistical Application of Quantile Reproducibility The theorem 1 talks on how to obtain the n − 1-dimensional quantile from a set of 1-dimensional quantile derivatives, which obviously can be calculated from bivariate marginal densities of the distribution. This logically leads to an idea to try to build a statistical estimate of the “big” conditional quantile of the distribution from a number of estimates of its bivariate densities. The outline of the algorithm would be as follows: • First estimate 1-dimensional quantiles of the distribution and their deriva- tives from a number of bivariate observations (see for example [2]). • Then using these quantile derivative estimates build the Pfaffian quantile equation 2 so that it’s completely integrable and its solution approximates the “big” conditional quantile. • Solve the quantile equation numerically and obtain the n-dimensional quantile estimate. Obviously for this algorithm one only needs a set of 2-dimensional observations all of which can be made independently. With the direct approach to estimate the n − 1-dimensional qunatile one would need to observe the entire vector of n dimensions. Now let us roughly compare the number of observations required to build the n− 1-dimensional quantile estimate when using the traditional direct approach and the algorithm described above. For convenience we will assume that the quantile and the quantile derivative estimation in both cases is done by first estimating the corresponding distribution densities and deriving conditional densities from the estimates. With the traditional approach we consider a sample of observations n o (1) (rn ) (x1 , . . . , x(1) n ), . . . , (x 1 , . . . , x (rn ) n ) for a random vector X = (X1 , . . . , Xn ) with the density f1...n (x1 , . . . , xn ). From the observations we construct the density estimate fˆ1...n (x1 , . . . , xn ). To simplify calculations we’ll suppose that the histogram method is used. For each variable Xi we divide the sample range into m intervals. This way we get mn n-dimensional parallelepipeds. If, in order to get a good estimate in each parallelepiped we need k observations, then the total amount of observations required to estimate the density of X would be rn = k · mn . Information Technology and Nanotechnology (ITNT-2016) 779 Data Science Shatskikh S., Melkumova L... If we use the algorithm proposed above we don’t need to estimate the n-variate density f1...n (x1 , . . . , xn ). Instead we construct estimates for the marginal den- sities fij (xi , xj ), i 6= j, i, j = 1, n, using the observations n o (1) (1) (r ) (r ) (xi , xj ), . . . , (xi 2 , xj 2 ) . If again for a good estimate we need k observations in each of the rectangles, then each of the density estimates fˆij (xi , xj ) will require r2 = km2 observations. And the total number of observations required to estimate all the bivariate densities equals n · (n − 1) n · (n − 1) sn = r2 · = km2 · . 2 2 It’s clear that for n ≥ 3 rn lim = ∞, m→∞ sn which means that the overall number of observations required to construct a n-dimensional quantile when n is relatively big would be much less in case if the proposed algorithm is used. Acknowledgments This work was partially supported by a grant of RFBR (project 16-01-00184 A). References 1. Cartan H. Differential forms. Houghton Mifflin, Boston, 1970. 2. Chaudhuri P. Global nonparametric estimation of conditional quantile functions and their derivatives. Journal of multivariate analysis, 1991; 39(2): 246269. 3. Godbillon C. Geometrie differentielle et mecanique analytique. Hermann, Paris, 1969. 4. Kotz S, Balakrishnan N, Johnson NL. Continuous Multivariate Distributions, Models and Applications. John Wiley & Sons, 2000; 1. 5. Melkumova LE, Shatskikh SYa. Solving not completely integrable quantile Pfaf- fian differential equations. Vestnik SamGU, Natural sciences series, 2012; 1(3): 2039. [In Russian] 6. Orlova IS, Shatskikh SYa. Pfaffian differential equations for conditional quantiles of multivariate probability distributions. Vestnik SamGU, Natural sciences series, 2010; 2(76): 3247. [In Russian] 7. Reinhard H. Equations differentielles: fondements et applications. Dunod, 2 edi- tion, 1989. 8. Shatskikh SYa. A necessary condition of conditional quantile reproducibility of multivariate probability distributions. Izvestiya RAEN, MMMIU, 2000; 4(4): 6772. [In Russian] Information Technology and Nanotechnology (ITNT-2016) 780 Data Science Shatskikh S., Melkumova L... 9. Shatskikh SYa, Knutova EM. Conditional quantile reproducibility of the multi- variate Student’s distribution. Izvestiya RAEN, MMMIU, 1997; 1(1): 3658. [In Russian] 10. Zacks S. The theory of statistical inference. John Wiley & Sons, New York, 1971. Information Technology and Nanotechnology (ITNT-2016) 781