Ellipsoidal distribution-free set Dmitriy Klyushin 1,*, , Andrii Tymoshenko1,* Taras Shevchenko National University of Kyiv, 60 Volodymyrska Street, 01033, Kyiv, Ukraine 1 Abstract This paper introduces a distribution-free approach based on the Hill's assumption and the Petunin ellipsoids. Several distributions are used to generate points and build ellipsoids, which are then used to check if test points with same distribution are created inside largest ellipsoid. As a result, a new prediction set is constructed in the form of Petunin ellipsoid, while confidence level refers to the number of points. The method described here works effectively for chosen distributions. Moreover, statistical analysis of the quantity of points inside is performed. This method is a useful tool for solving many urgent problems of machine learning, e.g. generalization of training samples, effective cross-validation etc. Keywords data mining, prediction set, Petunin ellipsoid, outlier detection 1 1. Introduction Construction of prediction sets is a popular problem in machine learning, often placed together with neural networks. In recent years many international scientists have discussed prediction sets. For example, Adam Khakhar, Stephen Mell and Osbert Bastani [1] used a trained code generation model in algorithm that leverages an abstract syntax tree based on programming language to create a set of programs with high confidence about the correct program. Another example by Soroush H. Zargarbashi, Mohammad Sadegh Akhondzadeh and Aleksandar Bojchevski [2] derive provably robust sets by defining bounds for the worst-case change related to conformity scores. Another important idea is to find distribution-free classification and sets. Here we can mention a work by Chirag Gupta, Aleksandr Podkopaev and Aaditya Ramdas [3] where they study calibration and prediction sets combined with confidence intervals. Their research is dedicated to binary classification in case of distribution-free sets. Based on demonstrated theorems, confidence intervals for binned probabilities allow to perform distribution free calibration. In [4] Hongxiang Qiu, Edgar Dobriban and Eric Tchetgen Tchetgen offer a novel flexible distribution-free method named PredSet-1Step for constructing prediction sets where asymptotic coverage is guaranteed under unknown covariate shift. As for [5] A.N. Angelopoulos, S. Bates, J. Malik and M.I. Jordan present an algorithm which changes a chosen classifier to determine a predictive set, where the true label is inside with probability set by user. This simple and fast algorithm reminds of Platt scaling but results into a formal finite-sample coverage for every model and dataset. Approach described in [6] by analysis of a holdout set allows to choose the size of the prediction sets and leads to explicit finite-sample guarantees. As a result, simple, distribution-free and Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine Corresponding author. These authors contributed equally. dmytroklyushin@knu.ua (D. Klyushin); andriitymoshenko@knu.ua (A. Tymoshenko) 0000-0003-4554-1049 (D. Klyushin); 0000-0002-8884-9054 (A. Tymoshenko) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 28 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings rigorous error control is obtained for many tasks, demonstrated on five large-scale machine learning problems. Some more works related to predictions offer various approaches: prediction based on language models [7], neural networks compared to calibrated predictions [8], distribution-free uncertainty quantification and conformal prediction [9], conformal risk control [10], conformal predictors applied for medical imaging [11], confident prediction in case distributions shift [12], conformal prediction robust to label noise [13], conformal prediction via probabilistic circuits [14]. Not only predictions attract modern scientists. Some related topics are also worth being mentioned: uncertainty quantification is performed over graph using conformalized graph neural networks [15], adversarial robustness applied to randomly smoothed classifiers [16], randomized smoothing for graphs and images [17], adversarially trained smoothed classifiers [18]. The purpose of our paper is to describe a method to construct ellipsoidal prediction set for a set of the randomly generated points, based on chosen distribution. The main tools for our forecast are predictive sets represented by ellipses, constructed using generated points. Test points are generated with same distribution, the more ellipses contain a point the higher probability of belonging to same class can be expected. Consider the problem of creating conformal prediction based on points x1 , x2 ,..., xm  d . The aim is to find a prediction set E ( x1 , x2 ,..., xm )  d resulting into probability p ( xm +1  E )  1 −  , where 0    1 is a chosen significance level, so that p ( xm +1  E ) is the confidence level of the predictive set. 2. Hill`s Assumption A( m) As x1 , x2 ,..., xm we denote a sample drawn from a population generated according to absolutely continuous distribution F. Next, we arrange it in the increasing order and create the variance series x(1)  x(2)  ...  x( m) , where x(i ) is i-th order statistics. The resulting order statistics x(1) , x(2) ,..., x( m) are dependent. The distribution Fk ( x ) of the k-th order statistics x( k ) can be calculated as m Fk ( u ) =  Cmi  F ( u ) 1 − F (u ) m −1 , where F ( u ) = p ( x  u ) . i i =k A( m) [19] states that if x(i ) is chosen from the population according to distribution F then j −i p ( xm +1  ( x(i ) , x( j ) ) ) = , j  i. (1) m +1 A( n ) was proven in papers of Yu.I. Petunin [20] and by several other scientists. Let us recall the proof for random variables  and  . If they are independent, then  p (   ) =  F ( u ) dF ( u ) , (2) − where F ( u ) and F ( u ) denote the distribution functions of  and  , respectively. The probability density of i-th order statistics is: m m f k ( u ) =  Cmi  F (u ) 1 − F (u ) =  Gi (u ). i m −i i =k i =k (3) m Hence, Fk ( u ) =  G ( u ) , i ' i =k Gk (u ) = Cmk k ( F (u ) ) (1 − F (u )) f (u ) − ( F (u ) ) ( m − k ) (1 − F (u ) ) f (u )  k −1 m−k k m − k −1   (4) 29 m − k −1  Gk +1 ( u ) = Cmk +1 ( F ( u ) ) (1 − F ( u ) )  = k +1   = Cmk +1 ( k + 1) ( F ( u ) ) (1 − F ( u ) ) f (u ) − ( F (u ) ) (1 − F (u ) ) ( m − k − 1) f (u ) . k m − k −1 k +1 m−k −2 The second term is compensated by the first term: m!( m − k ) m!( k + 1) m! m! −Cmk ( m − k ) + Cmk +1 ( k + 1) = − + =− + = 0. ( m − k ) !k ! ( m − k − 1)( ) ! k + 1 ! ( m − k − 1) ! k ! ( m − k −1)!k ! The last term of the previous sum is equal to zero ( F (u ) ) (1 − F (u ) ) ( m − m ) f (u ) = 0 . m 0 Thus, k −1 m−k f k ( u ) = mCmk −−11  F ( u ) 1 − F ( u ) f (u ) . Let us find p ( x  x(i ) ) and p ( x  x( j ) ) . Using the above equations, we have   1 p ( x  x(i ) ) =  F ( u ) dFi ( u ) = mCmi −−11   F ( u ) 1 − F (u ) m −i dF (u ) = mCmi −−11  v i (1 − v ) i m −1 dv. − − 0 It is proven that, 1  ( p )  ( q ) ( p −1)!( q −1)!  x (1 − x ) q −1 p −1 dx = = . 0  ( p + q − 1) ( p + q −1)! We can apply this equation as 1  ( i + 1)  ( m − i + 1)  (i + 1)  ( m − i + 1) i !( m − i )! (1 − x ) dx = B (i + 1, m − i + 1) = m −i +1−1 x = = i +1−1 0  (i +1 + m +1− i )  ( m + 2) ( m + 1)! j !( m − j )! m ( m −1)!( j −1)! j !( m − j )! j p ( xm+1  x( j ) ) = mCmi −−11 = = . ( m + 1)! ( m − j )!( j −1)!m ( m +1)( m −1)! m +1 Previous equation was obtained by multiplying numerator and denominator by (j 1). So, we get ( ( )) p xm +1  x(i ) , x( j ) = p ( xm +1  x j ) − p ( xm +1  xi ) = j − i m +1 m +1 m +1 = j −i . So, in case a random variable x is independent from x1 , x2 ,..., xm and it is chosen by sampling from the same population based on distribution F ( u ) , then m −1 p ( x  ( x(1) , x( m ) ) ) = . m +1 m −1 Remark 1. The confidence level of the tolerance interval ( x(1) , x( m) ) is , thus for m  39 the m +1 confidence level of this interval is less than 0.05. 3. Petunin Ellipsoids The algorithm for construction of the ellipsoid containing the set as m random points with the m −1 probability was proposed by Yu. I. Petunin. Statistical and geometrical properties of the m +1 Petunin ellipsoids were investigated in [21]. Here we applied -dimensional case. First, we find two points M n =  x1 ,..., xm  farthest from each other xk and xl of the set . Connect them with a line (next mentioned as diameter), then project all the points to the hyperplane orthogonal to this line. To simplify this, we can rotate all objects together around line center to make it horizontal. But then we will need to rotate it back in the end (Figure 1). 30 Figure 1: The furthest from each other points Next, we need to find the farthest points from the line. Construct lines parallel to the diameter through these points. Create lines that are orthogonal to diameter and pass through the farthest from each other points. As a result, we obtain a rectangle, which covers the given set of points and lies on a two-dimensional plane (Figure 2). Figure 2: Rectangle covering the images of the points By dividing the shorter side length by the longer side length, we can obtain the shrinking coefficient. Translate, rotate and shrink mentioned above rectangle to construct a square covering the given points. Figure 3: Square covering the images of the points Find its center and all distances from the center of the square to every image of point. Then, we need to find maximal distance and create a circle with center same as square center and radius that is equal to maximum distance from the center to the images of points. 31 Figure 4: The circle covering the images of the points Perform inverse transformations of this circle. The result is the Petunin ellipse (Figure 5). Figure 5: Petunin ellipse covering the initial points In high-dimensional case, we can construct a minimum volume axis-aligned orthogonal parallelepiped that contains images x1,..., xm of initial points. Perform shrinking transformation from the orthogonal parallelepiped to a hypercube. Find its center and distances from it to x1,..., xm . Next find the maximum distance. After that, construct a hypersphere with the center and radius that is equal to the maximum distance from the center to the images of the points. Perform inverse transformations (translation, rotation and stretching) and obtain the Petunin ellipsoid Em . Hill s m −1 assumption A( m) is true, so P ( xm +1  E ) = . m +1 Since at the last stage of construction of the Petunin ellipsoid we obtain the concentric spheres with one unique point at the surface, using the Petunin ellipsoids we can arrange the points by their statistical depth. The median point of the set (the most probable point) is the point nearest to the center of the Petunin ellipsoid and the outlier is the point at the boundary of the Petunin ellipsoid. 4. Numerical results In this section testing results are described. First, we generate a chosen distribution-based set of 1000 points and build ellipses through each point. Then we generate 1000 more points with the 32 same distribution and check the number of points inside the largest ellipse. Statistical characteristics of these results are shown below for three different distributions. 4.1. Normal distribution The first test was performed on normal distribution testing, 3 to 1. We generate 12 random numbers from 0 to 1, calculate their sum and subtract 6. Then we modify values by multiplying the result and adding value so that horizontal and vertical coordinate values can be generated in proportion 3 to 1. Expected probability 0.99762 Mean 0.99762 Mean Deviation 0.001873 Mode 0.996477 Median 0.998 Standard Deviation 0.0022867 Variance 0.0000052279 Kurtosis 3.24629 Skewness -0.892779 Range 0.01 Maximum 1 Minimum 0.989 Geometric Mean 0.9976174 Harmonic Mean 0.9976148 Figure 6: Average ellipse areas Normal distribution, 100 tests As we can see, the largest ellipse contains almost all points and areas slowly grow until last 50 points defining largest ellipses. 33 4.2. Exponential distribution Exponential with parameters -17, -50 as multipliers for logarithm from random value from 0 to 1. Expected probability 0.998158 Mean 0.998158 Mean Deviation 0.00138457 Mode 0.99943 Median 0.999 Standard Deviation 0.0018747 Variance 0.00000351465 Kurtosis 5.98476568 Skewness -1.5228786 Range 0.01 Maximum 1 Minimum 0.989999 Geometric Mean 0.998 Harmonic Mean 0.998 Figure 7: Average ellipse areas Exponential distribution, 100 tests Here the largest ellipse contains almost all points, but ellipse areas grow much faster. 4.3. Gamma distribution Gamma distribution was used here based on pseudorandom numbers with parameters 50 and 90 for horizontal and vertical values respectively. Expected probability 0.997760000000000 34 Mean 0.997760000000000 Mean Deviation 0.0017616 Mode 0.998438 Median 0.998 Standard Deviation 0.002399 Variance 0.0000057599 Kurtosis 5.95855 Skewness -1.65869 Range 0.012 Maximum 1 Minimum 0.987999 Geometric Mean 0.997757 Harmonic Mean 0.997754 Figure 8: Average ellipse areas Gamma distribution, 100 tests In this test ellipse contains almost all points and ellipse areas increase very fast after ellipse based on 800 points. More tests were performed for these distributions with other parameters and results were alike. Ellipse areas increased smoothly at first, but for the last 100-200 most distant points faster increase was reported. As for accuracy, we expected values to be approximately 0.998 and received alike results. distribution with rectangular area covered. 5. Conclusion Constructing Petunin ellipsoid is a useful approach for arranging data and detecting anomalies using statistical depth. According to obtained results, the algorithm leads to effective prediction 35 sets based on Petunin ellipsoid. The confidence level reached is theoretically precise for tested distributions. It allows us to compute statistical depth based on every point and detect outliers of the set. Experimental results approved theoretical properties of the Petunin ellipses. Declaration on Generative AI The authors have not employed any Generative AI tools. References [1] A. Khakhar, S. Mell, O. Bastani, PAC Prediction Sets for Large Language Models of Code. 2023. doi:10.48550/arXiv.2302.08703 [2] S. H. Zargarbashi, M. S. Akhondzadeh, A. Bojchevski, Robust Yet Efficient Conformal Prediction Sets. 2024. doi:10.48550/arXiv.2407.09165. [3] C. Gupta, A. Podkopaev, A. Ramdas, Distribution-free binary classification: prediction sets, confidence intervals and calibration, 2020. URL: https://proceedings.neurips.cc/paper/2020/file/26d88423fc6da243ffddf161ca712757-Paper.pdf. [4] H. Qiu, E. Dobriban, E. Tchetgen, Prediction sets adaptive to unknown covariate shift, Journal of the Royal Statistical Society Series B: Statistical Methodology 85(5) (2023) 1680 1705. doi:10.1093/jrsssb/qkad069. [5] A. Angelopoulos, S. Bates, J. Malik, M. I. Jordan, Uncertainty sets for image classifiers using conformal prediction, 2020. URL: https://arxiv.org/abs/2009.14193. [6] S. Bates, A. Angelopoulos, L. Lei, J. Malik, M. I. Jordan, Distribution-free, risk-controlling prediction sets, 2021. URL: https://arxiv.org/abs/2101.02703. [7] T. Liu, Y. Jiang, N. Monath, R. Cotterell, M. Sachan, Autoregressive structured prediction with language models, 2022. URL: https://arxiv.org/abs/2210.14698. [8] S. Park, O. Bastani, N. Matni, I. Lee, Pac confidence sets for deep neural networks via calibrated prediction, 2020. URL: https://arxiv.org/abs/2001.00106. [9] A. N. Angelopoulos, S. Bates, A gentle introduction to conformal prediction and distribution- free uncertainty quantification, 2021. URL: https://arxiv.org/abs/2107.07511. [10] A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, T. Schuster, Conformal risk control. ArXiv, abs/2208.02814, 2022. URL: https://api.semanticscholar.org/CorpusID:251320513. [11] C. Lu, A., Lemay, K. Chang, K. Hobel, J. Kalpathy-Cramer, Fair conformal predictors for applications in medical imaging, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2022, pp. 12008 12016. [12] M. Cauchois, S. Gupta, A. Aliand, J.C. Duchi, Robust validation: Confident predictions even when distributions shift. arXiv preprint arXiv:2008.04267, 2020. [13] B.-S. Einbinder, S. Bates, A. N. Angelopoulos, A. Gendler, Y. Romano, Conformal prediction is robust to label noise. ArXiv, abs/2209.14295, 2022. URL: https://api.semanticscholar.org/CorpusID:262091979. [14] M. Kang, N. M. Gurel, L. Li, B. Li, COLEP: Certifiably robust learning-reasoning conformal prediction via probabilistic circuits. In The Twelfth International Conference on Learning Representations, 2024. URL: https://openreview.net/forum?id=XN6ZPINdSg. [15] K. Huang, Y. Jin, E. Candes, J. Leskovec, Uncertainty quantification over graph with conformalized graph neural networks, 2023. URL: https://arxiv.org/pdf/2305.14535. [16] G.-H. Lee, Y. Yuan, S. Chang, T. Jaakkola, Tight certificates of adversarial robustness for randomly smoothed classifiers. Advances in Neural Information Processing Systems, 32, 2019. [17] A. Bojchevski, J. Gasteiger, S. Gunnemann, Efficient robustness certificates for discrete data: Sparsity-aware randomized smoothing for graphs, images and more. In: International Conference on Machine Learning, pp. 1003 1013. PMLR, 2020. 36 [18] H. Li, J. Salman, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, G. Yang, Provably robust deep learning via adversarially trained smoothed classifiers. Advances in Neural Information Processing Systems, volume 32, 2019. [19] population, Journal of the American Statistical Association 63 (1968) 677 691. [20] I. Madreimov, Yu. I. Petunin, Characterization of the uniform distribution using order statistics, Theory of Probability and Mathematiical Statistics 27 (1983) 105 110. [21] S. I. Lyashko, B. V. Rublev, Minimal Ellipsoids and Maximal Simplexes in 3D Euclidean Space, Cybernetics and Systems Analysis 39 (2003) 831 834. doi:10.1023/B:CASA.0000020224.83374.d7. 37