Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks

Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks SergioCaprioli sergio.caprioli@intesasanpaolo.com Intesa Sanpaolo S.P.A

-20121 Milano MI Italy

EmanueleCagliero emanuele.cagliero@intesasanpaolo.com Intesa Sanpaolo S.P.A

TO -10138 Torino Italy

RiccardoCrupi riccardo.crupi@intesasanpaolo.com Intesa Sanpaolo S.P.A

TO -10138 Torino Italy

Italian Workshop on Artificial Intelligence and Applications for Business and Industries

November 9 2023 Rome Italy

Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks 1613-0073 D8B063B7ED432CBC08F6A8FFCF1DCE2F GROBID - A machine learning software for extracting information from scholarly documents Variational Autoencoder VAE Credit Portfolio Model Concentration risk Interpretable neural networks Generative neural networks

In this research, we propose a novel approach for the quantification of credit portfolio Value-at-Risk (VaR) sensitivity to asset correlations with the use of synthetic financial correlation matrices generated with deep learning models. In previous work Generative Adversarial Networks (GANs) were employed to demonstrate the generation of plausible correlation matrices, that capture the essential characteristics observed in empirical correlation matrices estimated on asset returns. Instead of GANs, we employ Variational Autoencoders (VAE) to achieve a more interpretable latent space representation. Through our analysis, we reveal that the VAE latent space can be a useful tool to capture the crucial factors impacting portfolio diversification, particularly in relation to credit portfolio sensitivity to asset correlations changes.

Introduction 1.Credit Portfolio concentration risk

One of the most adopted models to measure the credit risk of a loan portfolio was proposed in [1] and it is currently a market standard used by regulators for capital requirements [2]. This model provides a closed-form expression to measure the risk in the case of asymptotic single risk factor (ASRF) portfolios. The ASRF model is portfolio-invariant, i.e., the capital required for any given loan only depends on the risk of that loan, regardless of the portfolio it is added to. Hence the model ignores the concentration of exposures in bank portfolios, as the idiosyncratic risk is assumed to be fully diversified. Under the Basel framework, Pillar I capital requirements for credit risk do not cover concentration risk hence banks are expected to autonomously estimate such risk and set aside an appropriate capital buffer within the Pillar II process [3].

A commonly adopted methodology of measuring concentration risk, in the more general case of a portfolio exposed to multiple systematic factors and highly concentrated on a limited number of loans, is to use a Monte Carlo simulation of the portfolio loss distribution under the assumption reported in [4]. The latter states that the standardized value of the 𝑖-th counterparty, 𝑉 𝑖 , is driven by a factor belonging to a set of macroeconomic Gaussian factors {𝑌 𝑗 } and by an idiosyncratic independent Gaussian process 𝜀 𝑖 :

𝑉 𝑖 = 𝜌 𝑖 𝑌 𝑗 + √ 1 − 𝜌 2 𝑖 𝜀 𝑖 = ∑ 𝑓 𝜌 𝑖 𝛼 𝑗,𝑓 𝑍 𝑓 + √ 1 − 𝜌 2 𝑖 𝜀 𝑖(1)

through a coefficient 𝜌 𝑖 . The systematic factors {𝑌 𝑗 } are generally assumed to be correlated, with correlation matrix Σ. The third term of Eq. 1 makes use of the spectral decomposition Σ = 𝛼𝛼 𝑇 to express 𝑉 𝑖 as a linear combination of a set of uncorrelated factors {𝑍 𝑓 }, allowing for a straightforward Monte Carlo simulation. The bank's portfolio is usually clustered into sub-portfolios that are homogeneous in terms of risk characteristics (i.e. industrial sector, geographical area, rating class or counterparty size). A distribution of losses is simulated for each sub-portfolio and the Value at Risk (VaR) is calculated on the aggregated loss.

The asset correlation matrix Σ is a critical parameter for the estimation of the sub-portfolio loss distribution, that is the core component for the estimation of the concentration risk. Therefore it is worth assessing the credit portfolio VaR sensitivity to that parameter.

Sampling Realistic Financial Correlation Matrices

As reported in [5],

"markets in crisis mode are an example of how assets correlate or diversify in times of stress. It is essential to see how markets, asset classes, and factors change their correlation and diversification properties in different market regimes. […] It is desirable not only to consider real manifestations of market scenarios from history but to simulate new, realistic scenarios systematically. To model the real world, quants turn to synthetic data, building artificially generated data based on so-called market generators. "

Marti [6] proposed Generative Adversarial Networks (GANs) to generate plausible financial correlation matrices. The author shows that the synthetic matrices generated with GANs present most of the properties observed on the empirical financial correlation matrices estimated on asset returns. In line with [6] we generated synthetic asset correlation matrices verifying some "stylized facts" of financial correlations.

We used a different type of neural network, Variational Autoencoders (VAE), to map historical correlation matrices onto a bidimensional "latent space", also referred to as the bottleneck of the VAE. After training a VAE on a set of historical asset correlation matrices, we show that it is possible to explain the location of points in the latent space. Furthermore, analyzing the relationship between the VAE bidimensional bottleneck and the VaR values computed by the Credit Portfolio Model using different historical asset correlation matrices, we show that the distribution of the latent variables encodes the main aspects impacting portfolio's diversification as presented in [7].

Sensitivity to the Asset Correlation matrix 2.1. Data

The dataset contains 𝑛 = 206 correlation matrices of the monthly log-returns of 𝑀 = 44 equity indices, calculated on their monthly time series from February 1997 to June 2022, using overlapping rolling windows of size 100 months. Historical time series considered are Total Market (Italy, Europe, US and Emerging Markets) and their related sector indices (Consumer Discretionary, Energy, Basic Materials, Industrials, Consumer Staples, Telecom, Utilities, Technology, Financials, Health Care), the source is Datastream.

Variational Autoencoder design

An autoencoder is a neural network composed of a sequence of layers ("encoder" E) that perform a compression of the input into a low-dimensional "latent" vector, followed by another sequence of layers ("decoder" D) that approximately reconstruct the input from the latent vector. The encoder and decoder are trained together to minimize the difference between original input and its reconstructed version.

Variational Autoencoders [8] consider a probabilistic latent space defined as a latent random variable 𝑧 that generated the observed samples 𝑥. Hence the "probabilistic decoder" is given by 𝑝(𝑥|𝑧) while the "probabilistic encoder" is 𝑞(𝑧|𝑥). The underlying assumption is that the data are generated from a random process involving an unobserved continuous random variable 𝑧 and it consists of two steps: (1) a value 𝑧 𝑖 is generated from some prior distribution 𝑝 * 𝜃 (𝑧), (2) a value x 𝑖 is generated from some conditional distribution 𝑝 * 𝜃 (𝑥|𝑧). Assuming that the prior 𝑝 * 𝜃 (𝑧) and the likelihood 𝑝 * 𝜃 (𝑥|𝑧) come from parametric families of distributions 𝑝 𝜃 (𝑧) and 𝑝 𝜃 (𝑥|𝑧), and that their PDFs are differentiable almost everywhere w.r.t. both 𝜃 and z, the algorithm proposed by [8] for the estimation of the posterior 𝑝 𝜃 (𝑧|𝑥) introduces an approximation 𝑞 𝜙 (𝑧|𝑥) and minimizes the Kullback-Leibler (KL) divergence of the approximate 𝑞 𝜙 (𝑧|𝑥) from the true posterior 𝑝 𝜃 (𝑧|𝑥). Using a multivariate normal as the prior distribution, the loss function is composed of a deterministic component (i.e. the mean squared error MSE) and a probabilistic component (i.e. the Kullback-Leibler divergence from the true posterior):

𝐾 𝐿 = − 1 2𝑛 𝑛 ∑ 𝑖=1 2 ∑ 𝑘=1 (1 + 𝑙𝑜𝑔(𝜎 𝑖 𝑘 2 ) − 𝜇 𝑖 𝑘 2 − 𝜎 𝑖 𝑘 2 ) MSE = 1 𝑛 𝑛 ∑ 𝑖=1 ∥ x 𝑖 − x 𝑖 ∥ 2 2 = 1 𝑛 𝑛 ∑ 𝑖=1 ∥ x 𝑖 − 𝐷(𝐸(x 𝑖 )) ∥ 2 2 Loss = MSE + 𝛽 ⋅ KL(2)

where 𝐸 and 𝐷 are the encoding and decoding map respectively

, 𝐸 ∶ x ∈ R 𝑀×𝑀 ⟶ 𝜃 = {𝜇 1 , 𝜇 2 , 𝜎 1 , 𝜎 2 } ∈ R 4 , 𝐷 ∶ z ∈ R 2 ⟶ x ∈ R 𝑀×𝑀 , z = 𝜇 + 𝜎 ⊙ 𝜀, 𝜇 = {𝜇 1 , 𝜇 2 }, 𝜎 = {𝜎 1 , 𝜎 2 }

, 𝜀 is a bivariate standard Gaussian variable, and 𝑛 < 𝑛 is the number of samples in the training set.

In this equation, 𝜇 𝑖 𝑘 and 𝜎 𝑖 𝑘 represent the mean and standard deviation of the 𝑘-th dimension of the latent space for the sample x 𝑖 . The loss function balances the MSE, reflecting the reconstruction quality, with 𝛽 times the KL divergence, enforcing a distribution matching in the 2-dimensional latent space. The KL divergence can be viewed as a regularizer of the model and 𝛽 as the strength of the regularization.

We trained the VAE for 80 epochs using a learning rate of 0.0001 with an Adam optimizer. The structure of the VAE is shown in Fig. 1. We randomly split the dataset described in section 2.1 in a training sample, used to train the network, and a validation set used to evaluate the performance. We used 30% of the dataset as validation set.

Variational Autoencoders were employed in previous works for financial applications. In particular Brugier and Turinici [9] proposed a VAE to compute an estimator of the Value at Risk for a financial asset. Bergeron et al. [10] used VAE to estimate missing points on partially observed volatility surfaces. Sokol [11] applied VAEs for interest rate curves simulation.

Comparison with linear models

We compared the performances of the Variational Autoencoders with the standard Autoencoder (AE) and with the linear autoencoder (i.e. the autoencoder without activation functions).

The linear autoencoder is equivalent to apply PCA to the input data in the sense that its output is a projection of the data onto the low dimensional principal subspace [12]. As shown in Fig. 2b the autoencoder performs better than VAE (Fig. 2a), while linear models have lower performances (Fig. 3a) even increasing the dimensions of the latent space (Fig. 3b). Hence, neural networks actually bring an improvement in minimizing the reconstruction error. The generative probabilistic component of VAE decreases the performance when compared to a deterministic autoencoder. On the other hand, it allows to generate new but realistic correlation matrices in the sense of stylized facts.

Latent space interpretability

According to Miller [13] and Lipton [14]:

Interpretable is a model such that an observer can understand the cause of a decision. Explanation is one mode in which an observer may obtain understanding in the latent space, for instance, building a simple surrogate model that mimics the original model to gain a better understanding of the original model's underlying mechanics.

For the sake of our analysis, we refer to the "interpretability" of the VAE as the possibility to understand the reason underlying the responses produced by the algorithm in the latent space. The Variational Autoencoder projected the 206 historical correlation matrices on a two dimensional probabilistic latent space represented by a bivariate normal distribution. As shown in Fig. 4a, the latent space generated by the VAE and AE are similar while the cluster of points in the middle is recovered only by the 3-dimensional linear autoencoder (Fig. 4b).

In order to understand the rationales underlying such representation, we analysed the relationship of the encoded values of the original correlation matrices with respect to their eigenvectors {𝜈 𝑖 | 𝑖 = 1 ∶ 𝑀} and eigenvalues {𝜆 𝑖 | 𝑖 = 1 ∶ 𝑀}. It turned out that the first component of the latent space (𝜇 1 ) is strongly negatively correlated to the first eigenvalue (Fig. 5).

As pointed out in [15] "the largest eigenvalue of the correlation matrix is a measure of the intensity of the correlation present in the matrix, and in matrices inferred from financial returns tends to be significantly larger than the second largest. Generally, this largest eigenvalue is larger during times of stress and smaller during times of calm. "

Hence, the first dimension of the latent space seems to capture the information related to the rank of the matrix i.e. to the "diversification opportunities" on the market. The interpretation of the second dimension (𝜇 2 ) of the latent space turned out to be related to the eigenvectors of the correlation matrix. In order to understand the other dimension we consider the cosine similarity 𝛼 𝑖 𝑡 between the 𝑖-th eigenvector at time 𝑡 and its average over time. Formally:

𝛼 𝑖,𝑡 = 1 𝑛 (∑ 𝑛 𝑡=1 𝜈 𝑖,𝑡 ) 𝑇 ⋅ 𝜈 𝑖,𝑡 ∥ 1 𝑛 (∑ 𝑛 𝑡=1 𝜈 𝑖,𝑡 ) 𝑇 ∥∥ 𝜈 𝑖,𝑡 ∥(3)

where 𝑖 is the number of the eigenvector and 𝑡 the index of the matrix in the dataset. Let us define 𝛼 1 = {𝛼 1,𝑡 } 𝑡=1,…𝑛 , 𝛼 2 = {𝛼 2,𝑡 } 𝑡=2,…𝑛 . The data point subgroups observed in the space (𝛼 1 , 𝛼 2 , 𝜆 1 ) can be traced to corresponding subgroups in the latent space (𝜇 1 , 𝜇 2 ), as shown in Fig. 6.

As pointed out in [7], each eigenvector can be viewed as a portfolio weights of stocks that defines a new index which is uncorrelated with the other eigenvectors. It follows that a change in eigenvectors can impact portfolio diversification. We can conclude that the VAE latent space effectively captures, in two dimensions, the main factors impacting the financial correlations, which is determinant for portfolio diversification.

Generating synthetic correlation matrices

As explained in Section 2.2, the probabilistic decoder of the VAE allows to generate a "plausible" correlation matrix starting from any point of the latent space. Hence, we defined a grid of 132 points of the latent space that cover approximately homogeneously an area centered around the origin and including the historical points. For each point on the grid, we used the decoder (described in Section 2.2) to compute the corresponding correlation matrix. Along the lines of [6], we checked whether the following stylized facts of financial correlation matrices hold for both the historical and the synthetic matrices.

• The distribution of pairwise correlations is significantly shifted towards positive values.

• Eigenvalues follow the Marchenko-Pastur distribution, except for a very large first eigenvalue and a couple of other large eigenvalues. • The Perron-Frobenius property holds true (first eigenvector has positive entries).

• Correlations have a hierarchical structure.

• The Minimum Spanning Tree (MST) obtained from a correlation matrix satisfies the scale-free property.

We verified that the distributions of pairwise correlations are shifted to the positive and that the distributions of the eigenvalues (each averaged respectively over the historical and synthetic matrices) are very similar to each other and can be approximated by a Marchenko-Pastur distribution, but for a first large eigenvalue and a couple of other eigenvalues. Regarding the Perron-Frobenius property, we verified that the eigenvector corresponding to the largest eigenvalue has strictly positive components. Inspecting the dendrogram of the correlation matrices we confirmed the presence of a hierarchical structure. Finally, the distribution of the degrees of the Minimum Spanning Tree (calculated on the mean of the matrices) is compatible with the scale-free property, i.e. very few nodes have high degrees while most nodes have degree 1. It is worth noting that the correlation matrices analyzed for our purposes were calculated starting from 44 equity indices (as explained in Section 2.1) instead of single stocks as shown in [6], hence a higher degree of concentration was expected.

Quantifying the sensitivity to asset correlations

For each matrix generated with the VAE probabilistic decoder, we estimated the corresponding VaR according to the multi-factor Vasicek model described in section 1.1. We used the VaR metric to show a proof of concept of the methodology and to be aligned to the Economic Capital requirements, but the same rationale can be followed adopting a different risk metric. As mentioned in section 1.1, Vasicek multi-factor model cannot be solved in closed form solution, hence it is necessary to run a Monte Carlo simulation for each generated matrix. We used a stratified sampling simulation with 1 million runs. In each estimation, the parameters of the model and portfolio exposures are held constant. Running the simulation for every sampled point of the latent space, we derived the VaR surface of Fig. 7.

To obtain an estimate of the sensitivity of the VaR to future possible evolutions of the correlation matrix, we "bootstrapped" (see Fig. 9) the historical time series of the points in the 2-dimensional latent space. We used a simple bootstrapping [16] and a block-bootstrapping technique [17] on the differences' time series of the two components of the VAE latent space, 𝜇 1 and 𝜇 2 (depicted in Fig. 8).

Interpolating the estimated VaR over the randomly sample grid (Fig. 7) we can derive the Value at Risk corresponding to any point of the latent space. Hence, for each point belonging to the distribution of correlations changes over a 1-year time horizon estimated via bootstrap, we can calculate the corresponding VaR without recurring to the Monte Carlo simulation.

In this way, we obtained the VaR distribution related to the possible variations of correlation matrices on a defined time horizon.

Conclusions

In this work we applied a Variational Autoencoder for generating realistic financial matrices that we used as input for the estimation of credit portfolio concentration risk estimated with a multi-factor Vasicek model. We deviated from the methodology proposed by G. Marti 2020 [6] who adopted a Generative Adversarial Network, in order to obtain an interpretable model by leveraging the dimensionality reduction provided by VAE. Using as a proof of concept a VAE trained on a dataset composed of 206 correlation matrices calculated on the time series of 44 equity indices using a rolling window of 100 months, we showed how it is possible, even using a small data sample, to derive an interpretation of the latent space that seems aligned to the main aspects driving portfolio diversification [7].

We exploited the generative capabilities of the VAE to extend the scope of the model beyond the necessarily limited size of the historical sample, generating a larger set of correlation matrices that retain all the realistic features observed in the market. Therefore, the VAE has primarily been utilised for data augmentation, assessing its efficacy in terms of the quality of the artificially generated matrices, determined by suitably testing the stylized facts known about the financial correlation matrices.

We computed the augmented sample of synthetic correlation matrices on a grid in the 2dimensional VAE latent space and for each synthetic matrix the corresponding credit portfolio loss distribution (and its VaR at a certain percentile) was obtained via Monte Carlo simulation under a multi-factor Vasicek model. This way we estimated a VaR surface over the VAE latent space.

Analyzing the time series of the encoded version of the correlation matrices (i.e. the two components of the probabilistic latent space) we easily estimated (via bootstrapping) the possible variation of the correlation matrices over a 1-year time horizon. Finally, using the interpolated VaR surface, we were able to estimate the corresponding VaR distribution obtaining a quantification of the impact of the correlations movements on the credit portfolio concentration risk. This approach gives rapid estimation of risk without depending on the extensive computations of Monte Carlo simulation, and it does so in a compressed, easy-to-visualize space that captures several aspects of market dynamics.

Our analysis provides clear indications that the capabilities of realistic data-augmentation provided by Variational Autoencoders combined with the ability to obtain model interpretability can prove useful for risk management purposes, when addressing the sensitivity of models on a structured multidimensional market data as the correlation matrix.

Figure 1 :1Figure 1: VAE Framework: The input layer comprises 1936 nodes, corresponding to the 44 × 44 matrix input. Subsequently, there are layers with 512, 250, and a central hidden layer with 4 nodes. These values represent the mean and variance of a bivariate gaussian distribution. The decoder receives as input two values sampled from the latent space and it is asked to reconstruct the input. Hence, the architecture is symmetrically mirrored until the output layer, which also has 1936 nodes.

Figure 2 :2Figure 2: Histogram of Mean squared error (MSE) of the Autoencoder and Variational Autoencoder on the historical correlation matrices, split into train and validation set.

Figure 3 :3Figure 3: Histogram of Mean squared error (MSE) of the linear autoencoders on the historical correlation matrices, split into train and validation set.

(a) VAE, AE and 2d-PCA latent space (b) 3-d PCA latent space

Figure 4 :4Figure 4: Comparison of the latent space generated with different models. The latent space generated by the VAE and AE are similar while the cluster of points in the middle is recovered only by the 3dimensional linear autoencoder.

Figure 5 :5Figure 5: Scatterplot of the first eigenvalue 𝜆 1 versus the first component of the latent space 𝜇 1 , showing a clear negative correlation.

(a) The points on the latent space (𝜇 1 , 𝜇 2 ) representing the historical correlation matrices. The latent space was conventionally partitioned in nine subgroups of data points identified by different colors. (b) The data points of Fig. (a) represented in the parameter space defined by 𝛼 1 , 𝛼 2 , and 𝜆 1 (also, the size of each dot corresponds to the value of 𝜆 2 ). The proximity of these data points consistently mirrors the subgroups illustrated in Fig. (a), with matching color schemes. There is a noticeable separation between different and this separation is well-defined in most cases. (c) Sampling in the latent space (𝜇 1 , 𝜇 2 ): each point can be decoded into a synthetic correlation matrix. The latent space was conventionally partitioned in nine regions identified by different colors, with the same convention as Fig (a). (d) The sampled points of Fig. (c), plotted with the same color schemes in the space formed by 𝛼 1 , 𝛼 2 and 𝜆 1 (also, the size of each dot corresponds to the value of 𝜆 2 ). Similar observations and considerations can be drawn here as those made for Fig. (a) and Fig. (b).

Figure 6 :6Figure 6: The distribution of the distance of the first two eigenvectors from their respective historical average and the distribution of the first eigenvalue characterize the regions of the latent space.

Figure 7 :7Figure 7: The surface generated from Value at Risk with respect to the points of the 2-dimensional latent space.

Figure 8 :8Figure 8: The time series of the variations of 𝜇 1 and 𝜇 2 , projections of the 206 historical correlation matrices in the 2-dimensional latent space.

Figure 9 :9Figure 9: Using simple bootstrap (on the left) or block-bootstrap (with 11 monthly steps) on the "compressed" representation of the correlation matrices, we estimated the distribution of the possible variation of the current matrix on 1-year time horizon.

Disclaimer

The views and opinions expressed within this paper are those of the authors and do not necessarily reflect the official policy or position of Intesa Sanpaolo. Assumptions made in the analysis, assessment, methodology, model and results are not reflective of the position of any entity other than the authors.

OAVasicek Probability of loss on loan portfolio KMV 1987 European Parliament and of the Council of 26 June 2013 on prudential requirements for credit institutions and investment firms and amending Regulation (EU) Official Journal of the European Union 2013. 2012. 2015 575 Banking Supervisors, Cebs guidelines on the management of concentration risk under the supervisory review process CEuropean 2010 31 PGrippa LGornicka Measuring concentration risk-A partial portfolio approach International Monetary Fund 2016 Matrix evolutions: synthetic correlations and explainable machine learning for constructing robust investment portfolios JPapenbrock PSchwendner MJaeger SKrügel The Journal of Financial Data Science 3 2021 Corrgan: Sampling realistic financial correlation matrices using generative adversarial networks GMarti ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE 2020 An analysis of eigenvectors of a stock market cross-correlation matrix HTNguyen PNTran QNguyen Econometrics for Financial Applications 2018 Springer DPKingma MWelling arXiv:1312.6114 Auto-encoding variational bayes 2013 arXiv preprint Deep learning of value at risk through generative neural network models: The case of the variational auto encoder PBrugière GTurinici MethodsX 10 102192 2023 Variational autoencoders: A hands-off approach to volatility MBergeron NFung JHull ZPoulos AVeneris The Journal of Financial Data Science 2022 Autoencoder market models for interest rates ASokol SSRN 4300756 2022 EPlaut arXiv:1804.10253 From principal subspaces to principal components with linear autoencoders 2018 arXiv preprint Explanation in artificial intelligence: Insights from the social sciences TMiller Artificial intelligence 267 2019 The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery ZCLipton Queue 16 2018 Construction of minimum spanning trees from financial returns using rank correlation TMillington MNiranjan Physica A: Statistical Mechanics and its Applications 566 125605 2021 Bootstrapping SAbney Proceedings of the 40th annual meeting of the Association for Computational Linguistics the 40th annual meeting of the Association for Computational Linguistics 2002 Block-bootstrapping for noisy data MMader WMader LSommerlade JTimmer BSchelter Journal of neuroscience methods 219 2013