Globally local and fast explanations of ๐‘ก-SNE-like nonlinear embeddings Pierre Lambert1,โˆ— , Rebecca Marion2 , Julien Albert2 , Emmanuel Jean3 , Sacha Corbugy2 and Cyril de Bodt1,4 1 UCLouvain - ICTEAM & TRAIL, Louvain-la-Neuve, Belgium 2 UNamur - NaDI/PReCISE & TRAIL, Namur, Belgium 3 Multitel & TRAIL, Mons, Belgium 4 MIT Media Lab, Cambridge [MA], USA Abstract Nonlinear dimensionality reduction (NLDR) algorithms such as ๐‘ก-SNE are often employed to visually analyze high-dimensional (HD) data sets in the form of low-dimensional (LD) embeddings. Unfortunately, the nonlinearity of the NLDR process prohibits the interpretation of the resulting embeddings in terms of the HD features. State-of-the-art studies propose post-hoc explanation approaches to locally explain the embeddings. However, such tools are typically slow and do not automatically cover the entire LD embedding, instead providing local explanations around one selected data point at a time. This prevents users from quickly gaining insights about the general explainability landscape of the embedding. This paper presents a globally local and fast explanation framework for NLDR embeddings. This framework is fast because it only requires the computation of sparse linear regression models on subsets of the data, without ever reapplying the NLDR algorithm itself. In addition, the framework is globally local in the sense that the entire LD embedding is automatically covered by multiple local explanations. The different interpretable structures in the embedding are directly characterized, making it possible to quantify the importance of the HD features in various regions of the LD embedding. An example use-case is examined, emphasizing the value of the presented framework. Public codes and a software are available at https://github.com/PierreLambert3/glocally_explained. Keywords dimensionality reduction, data visualization, interactivity, interpretability, explainability, ๐‘ก-SNE, data exploration 1. Introduction approximately geodesic distances. Numerous other schemes have also been developed that determine the Dimensionality reduction (DR) computes low- LD embedding design based on HD affinity matrices dimensional (LD) representations of high-dimensional [9, 10]. Regrettably, the local neighborhood preservation (HD) data, e.g., to visually explore them or to curb the of all of these techniques is limited in visualization curse of dimensionality [1]. The relevance of a DR contexts by the norm concentration phenomenon method for a given visualization task typically depends [11, 12], most probably due to their distance-preserving on its preservation of the HD neighborhoods in the nature [1, 13]. In contrast, the native shift invariance resulting LD embedding [2]. Two major frameworks of neighbor embedding (NE) algorithms [14] such as have been proposed for projecting from HD to LD Stochastic Neighbor Embedding (SNE) [5] mitigates coordinates [1]: one is based on preserving distances [3], this phenomenon, leading to astonishing DR quality. while the other is based on reproducing neighborhoods These achievements have naturally encouraged the [4, 5]. For instance, distance-preserving methods like development of numerous SNE-based methods, such principal component analysis (PCA) [6] and classical as the popular ๐‘ก-SNE [15], UMAP [16], multi-scale metric multidimensional scaling (MDS) [3] project HD perplexity-free approaches [17, 18, 19], etc. samples linearly; nonlinear variants of these methods While these nonlinear DR (NLDR) algorithms deliver (e.g., [7, 8]) aim to preserve weighted Euclidean or impressively faithful LD embeddings with respect to the HD data, their intrinsic nonlinearity greatly affects the Advances in Interpretable Machine Learning and Artificial Intelligence, interpretability of the LD representations. Indeed, the 2022 October 21, Atlanta, Georgia, USA obtained LD dimensions are hardly or most often not in- โˆ— Corresponding author. terpretable in terms of the HD features [20]. Since NLDR Envelope-Open pierre.h.lambert@uclouvain.be (P. Lambert); methods are not interpretable by design, previous studies cyril.debodt@uclouvain.be (C. de Bodt) GLOBE https://github.com/PierreLambert3 (P. Lambert); have developed techniques to analyze and interpret the https://github.com/cdebodt (C. de Bodt) LD embeddings, which is known as post-hoc explanation Orcid 0000-0003-2347-1756 (C. de Bodt) or interpretability [21]. One can for instance cite [22], ยฉ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). which proposes to explain visual LD clusters thanks to CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Interface for the proposed globally local and fast explanation framework. decision trees. On the other hand, [21] locally explain fast. The globally local nature of our approach refers to ๐‘ก-SNE embeddings by adapting LIME; the authors argue the fact that multiple local explanations are automatically that explaining the entire embedding at once would be computed over the entire LD embedding (i.e., globally). difficult, as ๐‘ก-SNE usually does not preserve large HD Such an automatic processing enables the user to directly distances well [20]. However, the local nature of ๐‘ก-SNE glimpse the overall explainability landscape of the em- motivates the computation of local explanations in the LD bedding, as well as a structured overview of the impact of embedding; LIME can then be revisited and performed the HD features in the various parts of the LD embedding. locally around a user-selected data point. Nevertheless, The regions for which local explanations are learned in such an approach has two main limitations: (1) it is slow, the LD embedding can be determined in different ways and (2) it does not cover the entire LD embedding auto- [24]: using a clustering algorithm such as K-means, as matically, as local explanations are only provided around in this work, thanks to a manual selection performed by data points that have been selected, one at a time. This the user, or by recursively splitting the embedding into approach is slow because, in order to explain a given data subcells along the LD dimensions based on a model error pointโ€™s position in the embedding, ๐‘ก-SNE must be reap- criterion. plied to many artificially simulated points around that Our fast and globally local explanation framework can data point; the non-parametric nature of ๐‘ก-SNE, combined be viewed as taking the best of both linear and nonlin- with its significant computational cost, greatly increases ear projection worlds: the LD embedding can indeed be computation time, which decreases the potential for in- generated by a nonlinear DR algorithm, achieving much teractivity. The second limitation of the method is that better DR quality in terms of data visualization thanks to the user only receives a local explanation around the increased flexibility and adaptability [12, 15, 2]. On the selected point in the embedding; she must thus explore other hand, the computed local explanations are linear the various regions of the embedding manually. This is and sparse, which promotes interpretability. Moreover, not realistic in practice, especially when working with the globally local explanations make it possible to readily large databases, and even more so since the approach is depict the importance of the HD features in the different not fast. regions of the LD embedding. As an experiment, an exam- This paper aims to address these limitations by devel- ple use-case on a public data set is presented, highlight- oping a fast and globally local explanation framework ing the usefulness of the proposed approach. Free code for NLDR embeddings. Based on the BIOT explanation and software are publicly available online (https://github. approach [23], this framework learns sparse linear re- com/PierreLambert3/glocally_explained), enabling the gression models for subsets of the data set and does not easy use of the proposed framework. require a reapplication of the NLDR algorithm, making it This paper is organized as follows: Section 2 first re- views some related works. Section 3 then presents our in this paper addresses the limitations of [21] by (1) di- proposed approach, while Section 4 discusses an example rectly providing local explanations everywhere in the LD use-case. Section 5 draws final conclusions. embedding (i.e., globally local explanations), (2) avoid- ing the need to sample new artificial data points, and (3) relying only on the calculation of linear regression mod- 2. Related works els, which ensures fast processing and hence facilitates interactivity. Interpreting NLDR techniques is a challenging task. To tackle this challenge, various approaches have been pro- posed. Some papers (e.g., [25, 26, 27]) have proposed 3. Proposed approach methods for explaining the LD embedding dimensions with respect to the HD features. Since local NLDR algo- This section introduces our proposed approach for glob- rithms such as ๐‘ก-SNE do not effectively preserve large dis- ally local and fast explanations of NLDR embeddings. tances, explaining the resulting embedding dimensions Section 3.1 first summarizes our notations. Section 3.2 with these methods may be misleading. Other methods then details our methodology, and Section 3.3 finally attempt to interpret NLDR results by explaining visual presents an optional fine-tuning strategy. clusters [22, 28, 29]. For example, in [22], the authors pro- pose an interactive pipeline for explaining clusters in the 3.1. Notations LD embedding using decision trees; this pipeline enables the user to manually select LD clusters, which are then Matrices are denoted with bold-faced capital letters (e.g., explained in terms of the HD features with a decision โƒ— vectors with bold-faced lower-case letters (e.g., ๐‘ฅ) ๐‘‹), โƒ— and tree, an interpretable model. The resulting model can scalars with lower-case letters (e.g., ๐‘ฅ). A single element be used to explain why certain data points are clustered from a matrix is denoted with a lower-case letter with two together and to identify the HD features that distinguish subscripts (e.g., ๐‘ฅ๐‘–๐‘— ), the first indicating the row and the the different clusters. In contrast, our proposed approach second indicating the column. Instances are indexed by aims to understand intra-cluster positions, i.e., the HD the letter ๐‘– โˆˆ {1, ..., ๐‘›}, features by the letter ๐‘— โˆˆ {1, ..., ๐‘‘}, features that make two points from the same cluster lie embedding dimensions by the letter ๐‘˜ โˆˆ {1, ..., ๐‘š} and at different corners of this cluster. Moreover, our frame- regions or subcells of the embedding by the letter โ„“ โˆˆ work makes it possible to not only explain LD clusters, {1, ..., ๐ฟ}. but more generally interpret the overall positions of the points in the embedding. 3.2. General methodology Other existing methods aim to locally and linearly ex- plain the position of a specific instance in the LD space. In [23], the Best Interpretable Orthogonal Transforma- In particular, [21] adapts LIME [30] to locally explain ๐‘ก- tion (BIOT) method was proposed to explain the dimen- SNE embeddings. The original version of LIME involves sions of multidimensional scaling (MDS) embeddings. In three steps. First, it samples instances around a point of the case of ๐‘ก-SNE, such an explanation strategy is not interest. Then, it queries the model for these instances. directly applicable because ๐‘ก-SNE only preserves local Finally, it fits an interpretable model with the result of the structure from the high-dimensional data. However, as queries. In [21], the authors use a SMOTE oversampling proposed in [21], ๐‘ก-SNE embeddings may be explained technique [31] to create new artificial neighbors for the locally. Instead of learning a BIOT explanation model for point of interest. To query ๐‘ก-SNE, the entire DR process the entire embedding (i.e., a single global explanation), is re-applied for each sampled instance, since the ๐‘ก-SNE we propose learning different BIOT models for different mapping function is unknown. Finally, BIR [32] โ€”which regions (or subcells) of the embedding (i.e. local expla- is the predecessor of BIOT [23], a method employed in nation). For a given region, the BIOT model identifies our work โ€”is used to produce local explanations; BIR the features that best explain the positioning of points finds the rotation of the queried sampled data that results within that region of the embedding, independently of all in the best explanation model (in terms of model sparsity other regions. This approach can be applied to any non- and error). While the approach presented in [21] pro- linear 2-D embedding, including embeddings generated vides nice intuitions about the LD embedding structure, by ๐‘ก-SNE and its extensions (e.g., [33, 19]) or by other it has several limitations. First, it can only compute one NLDR algorithms (e.g., [16, 17, 9, 34]). local explanation at a time, for one selected point. Sec- Let ๐‘‹โƒ— (๐‘›ร—๐‘‘) be the matrix of ๐‘‘ features used to generate ond, the obtained explanation is highly dependent on the the embedding ๐‘Œโƒ— (๐‘› ร— 2). Furthermore, let ๐‘Š โƒ— (๐‘‘ ร— 2) and ๐‘คโƒ—0 artificial sampling. Finally, running the entire NLDR pro- (2 ร— 1) contain the weights and intercepts for the linear cess for all sampled instances is (very) time consuming, models relating the features in ๐‘‹โƒ— to each dimension of the and thus prohibits interactivity. The approach presented embedding ๐‘Œ, โƒ— where there is one model per dimension. Finally, ๐‘…โƒ— (2๐‘ฅ2) is an orthogonal transformation matrix The interface displayed in Fig. 1 shows an embedding that is applied to ๐‘Œโƒ— to promote model sparsity and pre- with multiple local linear explanations: each explanation diction quality, and ๐œ† > 0 is a hyperparameter to control is composed of a green and a burgundy axis. Explanation model sparsity. For 2-D embeddings, the BIOT objective โƒA has been selected by the user; the color transparency function for global explanation is of the points increases linearly with the absolute differ- ence between their position in the embedding and the position predicted by the selected linear model (i.e., the ๐‘› 2 2 โƒ— = 1 greater the error, the more transparent). This enables the โƒ— , ๐‘คโƒ—0 , ๐‘…) ๐ฝ0 (๐‘Š โˆ‘ โˆ‘(โƒ— ๐‘ฆ โŠค ๐‘Ÿโƒ— โˆ’๐‘ค โˆ’๐‘ฅโƒ—โŠค ๐‘คโƒ— )2 +๐œ† โˆ‘ โ€–๐‘คโƒ—๐‘˜ โ€–1 , 2๐‘› ๐‘–=1 ๐‘˜=1 ๐‘– ๐‘˜ 0๐‘˜ ๐‘– ๐‘˜ ๐‘˜=1 user to visualize the portion of the embedding for which (1) the selected linear model is faithful. The right panel de- which is minimized w.r.t ๐‘Š,โƒ— ๐‘คโƒ—0 and ๐‘…โƒ— under the constraint picts the relative importance of the HD features for each that ๐‘…โƒ— is an orthogonal matrix (๐‘…โƒ—๐‘…โƒ—โŠค = ๐‘…โƒ—โŠค ๐‘…โƒ— = ๐ผโƒ—2 ). axis of the selected explanation (i.e., โƒA in this case), as Clearly, this objective function can be extended to the quantified by the local linear model weights; the horizon- case where different model parameters ๐‘Š โƒ— (โ„“) , ๐‘คโƒ—0(โ„“) and ๐‘…โƒ—(โ„“) tal bar under each feature name represents the featureโ€™s are optimized for different regions โ„“ of the embedding, signed linear projection weight (LPW) on the considered where the set of instances in region โ„“ is denoted ๐’ฎโ„“ . In axis, highlighting the importance of the feature in the practice, the best segmentation of the embedding into re- local explanation. For visual clarity, only the 5 features gions is unknown. In this paper, we propose segmenting with the greatest LPW magnitudes are depicted for each the embedding automatically by performing K-means on local explanation axis. The feature total sulfur dioxide has the embedding data. The choice of the hyperparameter ๐พ been selected by the user (mark โƒ). B When selecting a depends on the topology apparent in the embedding and feature in the right panel, thick indicators appear on both of the granularity of details desired by the user. Other axes of all local explanations, with lengths proportional strategies are possible, for instance by recursively divid- to the LPW magnitudes of the corresponding feature on ing the LD dimensions along their medians. all axes; mark โƒ C shows two such indicators. This makes it possible to grasp the influence of an HD feature in the various regions of the entire embedding. 3.3. Fine-tuning Each view in Fig. 2 shows the importance of a particu- In Section 3.2, the proposed strategy for automatic seg- lar feature in the embedding, with the respective feature mentation (K-means) depends on the coordinates of the indicated at the bottom of the panel. The left view high- instances in the embedding. However, the shape and size lights that free sulfur dioxide is particularly important of the zone that can be explained may not directly depend when explaining the top portion of the embedding along on the spatial coordinates of the embedding. This means a vertical direction, whereas the horizontal direction can that the regions identified using K-means may not be the be partly explained by the concentration of citric acid. most optimal with respect to the quality of the resulting We observe that the structures apparent in the bottom- explanations. In some cases, it is hence useful to fine- left part of the embedding are not very dependent on the tune the final regions by directly considering explanation three analyzed features. quality. To do so, we propose a method called Clustered BIOT, which reassigns instances ๐‘– to explanation regions ๐’ฎโ„“ based on a modification of BIOT. Further details on 5. Conclusion Clustered BIOT can be found in Appendix A. This work proposes a globally local and fast explana- tion framework that provides multiple local linear ex- planations for 2-D data embeddings, enabling the user 4. Experiments and discussion to assess, at a glance, the importance of different HD This section presents an example use-case for the pro- features, both locally and across the whole LD embed- posed method using an interactive user interface. This ding. An example use-case demonstrates that the method user interface is available on the public repository in- can effectively reveal zones in the embedding where dicated in the abstract. All of the featured embeddings points are organized according to specific HD features. are representations of the winequality-red dataset, avail- Finally, some accompanying software is provided (https: able in the UCI machine learning repository [35]. This //github.com/PierreLambert3/glocally_explained), target- data set contains 11 physico-chemical variables describ- ing both DR researchers and experts seeking to analyse ing various red wines. The embeddings are produced by their data with nonlinear dimensionality reduction visu- a recent NE algorithm that mixes ๐‘ก-SNE gradients with alization tools. those of a fast stochastic approximation of MDS, which Further works will include testing our framework with preserves HD data structures across multiple scales [34]. actual end-users in the context of a real use case; their Figure 2: Importance of 3 features in the local explanations of an embedding. feedback will enable the improvement of the various sionality reduction, Science 290 (2000) 2319โ€“2323. design choices of our interface. In addition, a qualitative DOI: 10.1126/science.290.5500.2319. comparison with other explainability methods such as [9] S. T. Roweis, L. K. Saul, Nonlinear dimensionality LIME will enable a more comprehensive evaluation of reduction by locally linear embedding, science 290 the proposed method. (2000) 2323โ€“2326. [10] J. Suykens, Data visualization and dimensionality reduction using kernel maps with a reference point, Acknowledgments IEEE Trans. Neural Netw. 19 (2008) 1501โ€“1517. [11] D. Francois, V. Wertz, M. Verleysen, The concentra- This work was supported by Service Public de Wallonie tion of fractional distances 19 (2007) 873โ€“886. Recherche under grant nยฐ 2010235-ARIAC by DIGITAL- [12] J. A. Lee, M. Verleysen, Quality assessment of di- WALLONIA4.AI. SC is supported by a FRIA grant (F.R.S.- mensionality reduction: Rank-based criteria, Neu- FNRS). rocomputing 72 (2009) 1431โ€“1443. [13] B. Schรถlkopf, A. Smola, K.-R. Mรผller, Nonlinear References component analysis as a kernel eigenvalue problem, Neural computation 10 (1998) 1299โ€“1319. [1] J. A. Lee, M. Verleysen, Nonlinear dimensionality [14] J. A. Lee, M. Verleysen, Shift-invariant similari- reduction, Springer Science & Business Media, 2007. ties circumvent distance concentration in stochastic [2] J. Venna, J. Peltonen, K. Nybo, H. Aidos, S. Kaski, In- neighbor embedding and variants, Procedia Com- formation retrieval perspective to nonlinear dimen- puter Science 4 (2011) 538โ€“547. sionality reduction for data visualization., Journal [15] L. Van der Maaten, G. Hinton, Visualizing data of Machine Learning Research 11 (2010). using t-sne., Journal of machine learning research [3] I. Borg, P. J. F. Groenen, Modern Multidimensional 9 (2008). Scaling: Theory and applications, Springer Science [16] L. McInnes, J. Healy, J. Melville, Umap: Uniform & Business Media, 2005. manifold approximation and projection for dimen- [4] T. Kohonen, The self-organizing map, Pro- sion reduction, arXiv preprint arXiv:1802.03426 ceedings of the IEEE 78 (1990) 1464โ€“1480. DOI: (2018). 10.1109/5.58325. [17] J. A. Lee, D. H. Peluffo-Ordรณรฑez, M. Verleysen, [5] G. Hinton, S. Roweis, Stochastic neighbor embed- Multi-scale similarities in stochastic neighbour em- ding, in: NIPS, volume 15, 2002, pp. 833โ€“840. bedding: Reducing dimensionality while preserv- [6] I. T. Jolliffe, Principal component analysis and fac- ing both local and global structure, Neurocomput- tor analysis, in: Principal component analysis, ing 169 (2015) 246โ€“261. Springer, 1986, pp. 115โ€“128. [18] C. de Bodt, D. Mulders, M. Verleysen, J. A. Lee, [7] J. W. Sammon, A nonlinear mapping for data struc- Perplexity-free t-SNE and twice Student tt-SNE, in: ture analysis 100 (1969) 401โ€“409. ESANN, 2018, pp. 123โ€“128. [8] J. B. Tenenbaum, V. De Silva, J. C. Langford, A [19] C. de Bodt, D. Mulders, M. Verleysen, J. A. Lee, Fast global geometric framework for nonlinear dimen- multiscale neighbor embedding, IEEE Transactions on Neural Networks and Learning Systems (2020). [20] M. Wattenberg, F. Viรฉgas, I. Johnson, How to use (2021) 2905โ€“2940. t-sne effectively, Distill 1 (2016) e2. [34] P. Lambert, C. de Bodt, M. Verleysen, J. A. Lee, [21] A. Bibal, V. M. Vu, G. Nanfack, B. Frรฉnay, Explain- Squadmds: A lean stochastic quartet mds improv- ing t-sne embeddings locally by adapting lime., in: ing global structure preservation in neighbor em- ESANN, 2020, pp. 393โ€“398. bedding like t-sne and umap, Neurocomputing 503 [22] A. Bibal, A. Clarinval, B. Dumas, B. Frรฉnay, Ixvc: (2022) 17โ€“27. An interactive pipeline for explaining visual clus- [35] D. Dua, C. Graff, UCI machine learning repository, ters in dimensionality reduction visualizations with 2017. URL: http://archive.ics.uci.edu/ml. decision trees, Array 11 (2021) 100080. [23] A. Bibal, R. Marion, R. von Sachs, B. Frรฉnay, Biot: Explaining multidimensional nonlinear mds embed- dings using the best interpretable orthogonal trans- A. Clustered BIOT formation, Neurocomputing 453 (2021) 109โ€“118. As mentioned in Section 3.3, the main method proposed [24] L. Pagliosa, P. Pagliosa, L. G. Nonato, Understand- in this paper can be fine-tuned with a method we call ing attribute variability in multidimensional pro- Clustered BIOT. Let ๐‘ง๐‘–โ„“ = 1 if instance ๐‘– is in region โ„“ jections, in: 2016 29th SIBGRAPI Conference on and 0 otherwise. The matrix ๐‘โƒ— containing all elements Graphics, Patterns and Images (SIBGRAPI), IEEE, ๐‘ง๐‘–โ„“ respects the general conventions of hard clustering 2016, pp. 297โ€“304. (each instance belongs to exactly one cluster and each [25] D. B. Coimbra, R. M. Martins, T. T. Neves, A. C. cluster contains at least one instance). Then, the objective Telea, F. V. Paulovich, Explaining three-dimensional function for Clustered BIOT is dimensionality reduction plots, Information Visu- alization 15 (2016) 154โ€“172. [26] M. Cavallo, ร‡. Demiralp, A visual interaction frame- ๐ฝ1 (๐‘โƒ— , {๐‘Š (โ„“) โƒ— (โ„“) , ๐‘คโƒ—0 ,๐‘…โƒ—(โ„“) }|๐ฟโ„“=1 ) work for dimensionality reduction based data explo- ๐‘› ๐ฟ 2 2 ration, in: Proceedings of the 2018 CHI Conference 1 (โ„“) (โ„“) (โ„“) (โ„“) = โˆ‘ โˆ‘ ๐‘ง โˆ‘(โƒ— ๐‘ฆ โŠค ๐‘Ÿโƒ— โˆ’ ๐‘ค0 ๐‘˜ โˆ’ ๐‘ฅโƒ—๐‘–โŠค ๐‘คโƒ—๐‘˜ )2 + ๐œ† โˆ‘ โ€–๐‘คโƒ—๐‘˜ โ€–1 on Human Factors in Computing Systems, 2018, pp. 2๐‘› ๐‘–=1 โ„“=1 ๐‘–โ„“ ๐‘˜=1 ๐‘– ๐‘˜ ๐‘˜=1 1โ€“13. (2) [27] X. Yuan, D. Ren, Z. Wang, C. Guo, Dimension projection matrix/tree: Interactive subspace visual which is minimized w.r.t ๐‘โƒ— and {๐‘Š โƒ— (โ„“) , ๐‘คโƒ—0(โ„“) , ๐‘…โƒ—(โ„“) }|๐ฟโ„“=1 under exploration and analysis of high dimensional data, the constraints that (i) ๐‘…โƒ—(โ„“) is an orthogonal matrix โˆ€โ„“ IEEE Transactions on Visualization and Computer Graphics 19 (2013) 2625โ€“2633. and (ii) ๐‘โƒ— respects the clustering conventions above. [28] T. Fujiwara, O.-H. Kwon, K.-L. Ma, Supporting For fixed ๐‘, โƒ— (โ„“) , ๐‘คโƒ—0(โ„“) , ๐‘…โƒ—(โ„“) }|๐ฟโ„“=1 can be โƒ— the solution for {๐‘Š analysis of dimensionality reduction results with found by training BIOT on each subset of instances ๐’ฎโ„“ , contrastive learning, IEEE transactions on visual- where ๐’ฎโ„“ โˆถ= {๐‘– | ๐‘ง๐‘–โ„“ = 1}. For fixed ๐‘Šโƒ— (โ„“) , ๐‘คโƒ—0(โ„“) and ๐‘…โƒ—(โ„“) and ization and computer graphics 26 (2019) 45โ€“55. a given instance ๐‘–, the solution for ๐‘งโƒ—๐‘– is the vector that [29] W. E. Marcilio-Jr, D. M. Eler, Explaining dimension- minimizes ality reduction results using shapley values, Expert ๐ฟ 2 Systems with Applications 178 (2021) 115020. (โ„“) (โ„“) ๐‘ฆ๐‘–โŠค ๐‘Ÿโƒ—๐‘˜ โˆ’ ๐‘ค0 ๐‘˜ โˆ’ ๐‘ฅโƒ—๐‘–โŠค ๐‘คโƒ—๐‘˜ )2 . โˆ‘ ๐‘ง๐‘–โ„“ โˆ‘(โƒ— (โ„“) (3) [30] M. T. Ribeiro, S. Singh, C. Guestrin, โ€ why should i โ„“=1 ๐‘˜=1 trust you?โ€ explaining the predictions of any clas- sifier, in: Proceedings of the 22nd ACM SIGKDD Since only one element of ๐‘งโƒ—๐‘– can be equal to one (instance international conference on knowledge discovery ๐‘– can belong to only one cluster), the optimal cluster for and data mining, 2016, pp. 1135โ€“1144. instance ๐‘– is whichever model โ„“ minimizes the prediction [31] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. error: Kegelmeyer, Smote: synthetic minority over- (โ„“) (โ„“) (โ„“) sampling technique, Journal of artificial intelligence ๐‘ฆ๐‘–โŠค ๐‘Ÿโƒ—๐‘˜ โˆ’ ๐‘ค0 ๐‘˜ โˆ’ ๐‘ฅโƒ—๐‘–โŠค ๐‘คโƒ—๐‘˜ )2 . arg minโ„“ (โƒ— (4) research 16 (2002) 321โ€“357. Thus, Clustered BIOT can be optimized by alternating [32] R. Marion, A. Bibal, B. Frรฉnay, Bir: A method for between clustering instances according to prediction er- selecting the best interpretable multidimensional ror and fitting BIOT models to the clusters. An instance scaling rotation using external variables, Neuro- ๐‘– is assigned to cluster โ„“ if BIOT model โ„“ has the lowest computing 342 (2019) 83โ€“96. prediction error for that instance compared to the other [33] B. Kang, D. Garcรญa Garcรญa, J. Lijffijt, R. Santos- models. Rodrรญguez, T. De Bie, Conditional t-sne: more infor- mative t-sne embeddings, Machine Learning 110