<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards XAI for Optimal Transport</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Philip</forename><surname>Naumann</surname></persName>
							<email>p.naumann@tu-berlin.de</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Machine Learning Group</orgName>
								<orgName type="institution">Technical University of Berlin</orgName>
								<address>
									<addrLine>Marchstr. 23</addrLine>
									<postCode>10587</postCode>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">BIFOLD -Berlin Institute for the Foundations of Learning and Data</orgName>
								<address>
									<addrLine>Ernst-Reuter Platz 7</addrLine>
									<postCode>10587</postCode>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards XAI for Optimal Transport</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">235A3A7FB0180788B3528E0478B8D2C4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Explainable AI</term>
					<term>Optimal Transport</term>
					<term>Distribution Shifts</term>
					<term>Counterfactual Explanations</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Transport phenomena (or distribution shifts) arise in many disciplines and are often of great scientific interest. Machine learning (ML) is increasingly used in conjunction with optimal transport (OT) to learn models for these. While XAI has improved the transparency of ML models, there has been little discussion on how to explain the factors that drive a distribution shift. Specifically, the issue of opening the OT black box has only received limited attention. Traditional classification models can distinguish between two distributions, but post-hoc explanations based on their gradients may not reveal the true reasons behind their differences. Our goal is to make OT explainable and establish XAI-OT to generate more accurate explanations for distribution shifts. We also discuss concerns regarding the accuracy of optimal transport in the presence of data issues, which we assume to have implications beyond explanations.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Transport phenomena are a crucial focus of scientific research and can manifest themselves in the form of a distribution shift. Understanding these shifts can provide new insights into the factors that led to the observed changes. This can assist scientists in investigating realworld scenarios and is receiving increased attention. For examples, see the recent DistShift workshop <ref type="bibr" target="#b0">[1]</ref> at NeurIPS 2022 or the WILDS benchmark <ref type="bibr" target="#b1">[2]</ref>.</p><p>Machine learning (ML) is popularly used to learn from data with great success. Typical tasks include classification or regression. Several methods are available to explain the classification outcome of a model (e.g. <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>). They can provide valuable insights into the modeled data, helping practitioners comprehend underlying phenomena better. However, not much focus has been put on understanding distribution shifts so far <ref type="bibr" target="#b4">[5]</ref>. Moreover, ML models themselves can be subject to these shifts causing a worsening of their performance (cf. continual learning <ref type="bibr" target="#b5">[6]</ref>). Finding and understanding the reasons for a shift is therefore highly important. Additionally, there is evidence (see <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b4">5]</ref> and section 4), that conventional classifiers that discriminate between two distributions are insufficient to accurately detect underlying shift reasons. Our work aims to fill this gap.</p><p>Various methods can be used to study the relationships between distributions. A particular framework is called optimal transport (OT). Its underlying theory is well studied and comes with guarantees on the optimality of the solution (cf. <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>). It solves an optimization problem that yields a distance between a source and target distribution-the so-called Wasserstein distance. In addition, a transportation plan with information on the allocation of mass between each source and target point is induced. This plan can be used to transport points between the two distributions. Under certain assumptions (cf. <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>), the plan becomes a unique mapping function. Since the OT map is considered an 'optimal' model that represents the relationship between two distributions, it is a valuable tool for analyzing and explaining shifts <ref type="bibr" target="#b4">[5]</ref>. We see major challenges with this, however:</p><p>It is unclear how to summarize and extract the most intrinsic and relevant information from the maps. Even though they already hold valuable information on the reasons for the shift (cf. <ref type="bibr" target="#b4">[5]</ref>), we argue that OT does not directly explain the mapping in a human-comprehensible way. While it might be sufficiently transparent for a few data points in low-dimensional spaces, it quickly becomes difficult to interpret when the dimensionality increases. Because of this, we regard OT solutions as a 'black box', similar to deep neural networks (DNNs) in ML. Our goal is to move beyond this black box and make OT maps more explainable.</p><p>Furthermore, as intriguing as the theoretical guarantees of OT sound, there are also potential pitfalls where it leads to a solution that can be sub-optimal or even wrong. Even though it is an 'optimal' solution from a theoretical perspective for the data at hand, it is not guaranteed that the data is also optimal. Most real-world datasets are only an empirical sample of the true population. Since this is not necessarily representative, it is questionable if OT can provide a truthful approximation or even the correct solution in these cases. Statistical problems in the data are known to cause issues (e.g. <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref> investigate the effect of outliers). We see one root cause for this in the strict mathematical formulation of OT, as it does not handle incomplete or incorrect data well. For this reason, it is especially important to consider the data and investigate it for potential issues. If we can explain OT maps, such issues may be revealed in the process and aid users in adjusting their data and model accordingly.</p><p>Apart from this, the cost function is another bottleneck for the success of the optimization. Since it is the main component of the OT objective, it heavily affects the solution. It is known that inappropriate cost functions lead to unexpected or sub-optimal solutions (e.g. <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>). In the case of image data, e.g., it is usually not appropriate to apply the Euclidean distance in the input space. Still, the squared Euclidean distance is a common go-to cost function as it provides valuable theoretical properties in the context of OT (cf. <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>). This suggests it is also important to carefully consider the used cost function in terms of appropriateness to the problem at hand. More expressive representations of the data might be required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Counterfactual explanations <ref type="bibr" target="#b2">[3]</ref> can be seen as a special form of a distribution shift. These shifts occur at the decision level of a given classification model. They aim to explain the question what would my input look like if it belonged to a different class <ref type="bibr" target="#b2">[3]</ref>. A typical requirement is, that the perturbation that leads to the other class should be applied with minimal effort. Additionally, the problem formulation depends on the decision function of a classifier. Without taking the nature of the data into account, it can lead to the computation of an adversarial attack <ref type="bibr" target="#b14">[15]</ref>. Nowadays it is common that truthful counterfactual explanations should stay on the data manifold (e.g. <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref>). Apart from using surrogate models, this can also be enforced as explicit constraints that guide the generation process (e.g. <ref type="bibr" target="#b17">[18]</ref>). Some works, e.g. <ref type="bibr" target="#b18">[19]</ref>, have begun to use OT for this purpose. The main advantage over previous approaches is, that the whole distribution is considered in the process. Traditional counterfactual methods often focus on optimizing for a single instance and do not take the underlying distribution into account.</p><p>Recently, works have emerged that specifically call for a need to explain distribution shifts <ref type="bibr" target="#b19">[20]</ref>. One particularly interesting direction uses optimal transport for this purpose <ref type="bibr" target="#b4">[5]</ref>. The authors propose two different methods: one aims to explain shifts in a subset of features, and the other uses clustering to find differing modalities. While the former can be used to restrict the explanation to certain features, the latter can explain sub-shifts within the major shift. Both methods return a counterfactual at the data level in the form of a mean shift towards either the subset of features or the different clusters (i.e. one mean shift per cluster). Since using OT to explain distribution shifts appears to be promising, we want to investigate this direction further.</p><p>Another recent work <ref type="bibr" target="#b20">[21]</ref> uses OT to learn a classifier whose gradient is guaranteed to point to the other class by design. This provides two interesting properties: it makes the classifier more robust to adversarial attacks and it makes the gradient more informative. Further, this property of the gradient also holds a strong resemblance to counterfactual explanations, as the authors note <ref type="bibr" target="#b20">[21]</ref>. By following the gradient path, a potentially useful explanation emerges, instead of an adversarial example. In contrast to their work, we do not aim to learn a new classifier with OT properties but rather retrieve explanations that can be independent of a surrogate ML model.</p><p>It is known that OT maps are highly sensitive to data issues. The popular Wasserstein Generative Adversarial Networks (WGANs) <ref type="bibr" target="#b21">[22]</ref>, for example, were proposed as a more robust alternative to standard GANs <ref type="bibr" target="#b22">[23]</ref>. They use an OT-based loss function to learn the generative model. Since OT also considers the geometry of the data, the authors found this loss design to be more robust to the issue of mode collapse <ref type="bibr" target="#b21">[22]</ref>. However, in <ref type="bibr" target="#b9">[10]</ref> the authors found that WGANs are still affected by other issues. They are not robust to outliers in the data which can lead to undesired image generations. This can be a serious practical issue, as there is no guarantee that the model will not produce inappropriate images. Moreover, it was shown in <ref type="bibr" target="#b23">[24]</ref> that WGANs are not necessarily learning the correct Wasserstein distance, even though they specifically optimize for it. Surprisingly, they still perform well on their main task of data generation. This raises the question of how important an 'optimal' transport is.</p><p>Recently, other transport-based models like Cycle-GAN <ref type="bibr" target="#b24">[25]</ref> have been investigated in terms of data issues as well. In <ref type="bibr" target="#b13">[14]</ref>, the authors criticize that the mappings of Cycle-GAN are seemingly random. They improve this by incorporating an OT loss to consider the geometry of the data and produce more coherent mappings. Moreover, they show that Cycle-GAN transport can fail to align with human expectations in the presence of missing data. This indicates that data issues are a concern for other transport-based models as well, giving the topic of detecting such problems relevance beyond OT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Research Questions and Approach</head><p>While the black box of classical machine learning models like classifiers has been successfully opened (cf. <ref type="bibr" target="#b25">[26]</ref>), explanation techniques for models of distribution shifts have only received little attention. Recently, optimal transport has been used to explain distribution shifts <ref type="bibr" target="#b4">[5]</ref>. However, we argue that OT models are still largely a black box as they are not directly humancomprehensible. We aim to fill this gap by investigating two primary topics to establish XAI-OT:</p><p>(1) Can we design XAI techniques to faithfully explain OT models so that they become interpretable for humans? We want to develop XAI methods for opening the OT black box. Our investigation will assess whether existing XAI techniques apply to distribution shifts, or if specific techniques, building tightly on OT maps, need to be designed. The preliminary evidence in section 4 suggests that the gradient of classifier DNNs is not suitable for this task in some cases and that OT provides a more truthful explanation. In practice, this may take the form of attributing the Wasserstein distance across input features, either globally or at the level of individual data points. For this purpose, we will investigate perturbation methods, e.g. gradient-based, or propagation-based techniques like layer-wise relevance propagation (LRP) <ref type="bibr" target="#b3">[4]</ref>. Notably, exploring the Kantorovich dual representation of OT (e.g. <ref type="bibr" target="#b26">[27]</ref>) appears to be promising for this, since it can be expressed as a function of the input. Additionally, we will evaluate the faithfulness and interpretability of the generated explanations. Toward this end, we will explore techniques such as pixel-flipping or human evaluations.</p><p>(2) Can we use XAI-OT to gain insights into real-world transport phenomena? As the consideration of OT for explaining distribution shifts shows promise <ref type="bibr" target="#b4">[5]</ref>, we want to further investigate its potential. Concretely, we aim to use XAI-OT to explain real-world transport phenomena, like simulated processes or shifts between different data sources. XAI-OT may also be used to inspect the quality of the OT model itself, in particular, diagnosing potential issues such as overfitting effects or the reliance on spurious correlations in the data (cf. <ref type="bibr" target="#b27">[28]</ref>). This way, it can help to find out why a mapping failed to meet expectations, so a user can act upon it and correct the model or data. We will also explore the intriguing connection to counterfactual explanations, as highlighted in, e.g., <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b18">19]</ref>. Our goal is to understand how effective OT is for generating explanations and in which contexts it is most beneficial. Finally, we aim to investigate its usefulness for uncovering novel relationships across various domains, particularly in fields of significance such as medicine or chemistry.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Preliminary Results</head><p>We now discuss our preliminary analysis, suggesting that existing XAI techniques may not be amenable for explaining distribution shifts and that specific XAI solutions for OT need to be developed. In fig. <ref type="figure" target="#fig_1">1</ref> we demonstrate the divergence between the classifier and OT gradient. The target data represents a data shift of the source data that only occurred on the x-axis. This means only one feature is relevant to explain the shift. A classifier 𝑓 : X → {0, 1} was trained  to discriminate between the two datasets. Additionally, 𝜑 : X → R is the so-called Kantorovich potential <ref type="bibr" target="#b8">[9]</ref> function that was learned by a different neural network.</p><p>Feature relevance: gradient vs. OT: Even though the decision boundary of 𝑓 is well learned to discriminate between the two classes (i.e. the dashed line between source and target in fig. <ref type="figure" target="#fig_1">1</ref>), the gradients do not explain the data shift correctly. As expected, they point to the decision boundary and suggest that the y-axis is also relevant for the shift. Such false attributions of feature relevance are a concern in neuroscience <ref type="bibr" target="#b6">[7]</ref>, giving this issue important practical implications. The OT potential, on the other hand, detects the true shift cause. The contour lines of the potential function are depicted in solid and are approximately orthogonal to the true shift direction. This behavior of the potential was also used in <ref type="bibr" target="#b20">[21]</ref> to learn classifiers whose gradients are aligned with the distributions. To conclude, this simple example illustrates why the gradient of a classification model can be deceptive as an explanation for distribution shifts. It does not account for the underlying data distribution and gives too much weight to uninvolved features. Subsequent XAI techniques that make use of the gradient information are therefore expected to provide a wrong explanation for the occurrence of the shift.</p><p>Counterfactual explanations: Another interesting observation can be made in terms of counterfactual explanations. The red squares in fig. <ref type="figure" target="#fig_1">1</ref> exemplify simple counterfactuals that were computed to possess high target class confidence (95% ≤) according to the classifier. As can be seen, they are on the data manifold and admit to the shortest perturbation criterion. However, when we compare them to the OT locations (green squares), it becomes obvious that just staying on the manifold is not necessarily sufficient. The original, relative representation of the source points within their distribution is not reflected well in the target distribution in the case of the classifier counterfactuals. In contrast, the OT map provides better target representations as it considers the whole distribution. Moreover, simple counterfactual explanations likely have difficulties in reaching the outer points that the OT map hits. Some parts of the distribution could be hardly reachable for a standard counterfactual. We think that exactly this benefit of OT is crucial for truthful explanations.</p><p>Besides, even though the previous examples suggest that OT is an intriguing tool for explaining data shifts, it is unclear how to summarize the map. Moreover, OT does not always work well as data issues can distract the map. For these reasons, we want to focus our research in the direction of XAI-OT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Outlook</head><p>Finding the true factors that drive data shifts is valuable information. Gaining such knowledge has wide-ranging implications in other scientific fields. Thus, we aim to leverage XAI for optimal transport. A major goal is to propose a method that can uncover previously unknown relationships, possibly helping scientific research in significant fields such as medicine.</p><p>Optimal transport is increasingly used in various fields of ML. We assume that many users do not pay specific attention to the impact of data quality or the utilized cost function on OT. It might even be a mostly unknown pitfall since OT losses may still appear to work in practice. Thus, we want to raise awareness of these issues and their possible consequences on OT. More robustness will likely lead to even better results. This could mean, e.g., having a human-in-the-loop type of feedback. That is, a user may post-hoc diagnose their OT model with the tools we provide and possibly act to resolve any revealed issues.</p><p>Lastly, there is evidence that our hypotheses on the statistical data issues do not only apply to optimal transport, but to other transport-based models (e.g. Cycle-GAN) as well. For example, <ref type="bibr" target="#b13">[14]</ref> shows that Cycle-GANs cannot naturally handle data gaps, which leads to wrong mappings. In a broader scope, data issues are already known to cause problems in classical ML models <ref type="bibr" target="#b27">[28]</ref>. This means, our investigations aim to extend the literature in this direction by analyzing the behavior and robustness of transport-based models in general.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A comparison of classifier vs. OT explanations in the context of distribution shifts.</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>I would like to express my gratitude to Grégoire Montavon and Jacob Kauffmann for their invaluable assistance with this project. We gratefully acknowledge funding from the German Federal Ministry of Education and Research under the grant BIFOLD24B.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://sites.google.com/view/distshift2022" />
		<title level="m">NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications</title>
				<imprint>
			<date type="published" when="2022-04-15">2022. 15-April-2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">WILDS: A Benchmark of in-the-Wild Distribution Shifts</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Koh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sagawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Marklund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Balsubramani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th International Conference on Machine Learning</title>
				<meeting>the 38th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="5637" to="5664" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Counterfactual explanations without opening the black box: Automated decisions and the GDPR</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wachter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mittelstadt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Russell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Harvard Journal of Law and Technology</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="841" to="887" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Layer-Wise Relevance Propagation: An Overview</title>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-28954-6_10</idno>
	</analytic>
	<monogr>
		<title level="m">Explainable AI: Interpreting, Explaining and Visualizing Deep Learning</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="193" to="209" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Towards Explaining Distribution Shifts</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kulinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">I</forename><surname>Inouye</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International Conference on Machine Learning</title>
				<meeting>the 40th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="17931" to="17952" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Online continual learning with natural distribution shifts: An empirical study with visual data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Sener</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Koltun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)</title>
				<meeting>the IEEE/CVF International Conference on Computer Vision (ICCV)</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="8281" to="8290" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">On the interpretation of weight vectors of linear models in multivariate neuroimaging</title>
		<author>
			<persName><forename type="first">S</forename><surname>Haufe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Meinecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Görgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dähne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-D</forename><surname>Haynes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Blankertz</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neuroimage.2013.10.067</idno>
	</analytic>
	<monogr>
		<title level="j">NeuroImage</title>
		<imprint>
			<biblScope unit="volume">87</biblScope>
			<biblScope unit="page" from="96" to="110" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Optimal Transport: Old and New, Grundlehren Der Mathematischen Wissenschaften</title>
		<author>
			<persName><forename type="first">C</forename><surname>Villani</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-540-71050-9</idno>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>Springer</publisher>
			<pubPlace>Berlin Heidelberg</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Peyré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cuturi</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1803.00567</idno>
		<title level="m">Computational Optimal Transport</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Balaji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chellappa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feizi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="12934" to="12944" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Outlier-robust optimal transport</title>
		<author>
			<persName><forename type="first">D</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Solomon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yurochkin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th International Conference on Machine Learning</title>
				<meeting>the 38th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021-</date>
			<biblScope unit="volume">139</biblScope>
			<biblScope unit="page" from="7850" to="7860" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Outlier-robust optimal transport: Duality, structure, and statistical analysis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Nietert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Goldfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cummings</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Artificial Intelligence and Statistics, AISTATS 2022</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022-03">March 2022. 2022</date>
			<biblScope unit="volume">151</biblScope>
			<biblScope unit="page" from="11691" to="11719" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Making transport more robust and interpretable by moving data through a small number of anchor points</title>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Azabou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th International Conference on Machine Learning</title>
				<meeting>the 38th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="6631" to="6641" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">CycleGAN Through the Lens of (Dynamical) Optimal Transport, in: Machine Learning and Knowledge Discovery in Databases</title>
		<author>
			<persName><forename type="first">E</forename><surname>De Bézenac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gallinari</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-86520-7_9</idno>
	</analytic>
	<monogr>
		<title level="m">Research Track</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="132" to="147" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Intriguing properties of neural networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zaremba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bruna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Erhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1312.6199</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Learning Model-Agnostic Counterfactual Explanations for Tabular Data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pawelczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Broelemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kasneci</surname></persName>
		</author>
		<idno type="DOI">10.1145/3366423.3380087</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The Web Conference 2020</title>
				<meeting>The Web Conference 2020<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3126" to="3132" />
		</imprint>
	</monogr>
	<note>WWW &apos;20</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Diffeomorphic Counterfactuals With Generative Models</title>
		<author>
			<persName><forename type="first">A.-K</forename><surname>Dombrowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Gerken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kessel</surname></persName>
		</author>
		<idno type="DOI">10.1109/TPAMI.2023.3339980</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="page" from="3257" to="3274" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Consequence-Aware Sequential Counterfactual Generation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ntoutsi</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-86520-7_42</idno>
	</analytic>
	<monogr>
		<title level="m">Machine Learning and Knowledge Discovery in Databases</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="682" to="698" />
		</imprint>
	</monogr>
	<note>Research Track</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>You</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nilsson</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2401.13112</idno>
		<title level="m">DISCOUNT: Distributional Counterfactual Explanation With Optimal Transport</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Namkoong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective</title>
		<author>
			<persName><forename type="first">M</forename><surname>Serrurier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Mamalet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Fel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Béthune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Boissin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Thirty-Seventh Conference on Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">M</forename><surname>Arjovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gan</forename><surname>Wasserstein</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1701.07875</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Generative Adversarial Nets</title>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pouget-Abadie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mirza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ozair</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">27</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Kantorovich strikes back! Wasserstein GANs are not optimal transport?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Korotin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Burnaev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="13933" to="13946" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks</title>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Isola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Efros</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICCV.2017.244</idno>
	</analytic>
	<monogr>
		<title level="m">2017 IEEE International Conference on Computer Vision (ICCV)</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2242" to="2251" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications</title>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Anders</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<idno type="DOI">10.1109/JPROC.2021.3060483</idno>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the IEEE</title>
		<imprint>
			<biblScope unit="volume">109</biblScope>
			<biblScope unit="page" from="247" to="278" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Optimal transport mapping via input convex neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Makkuva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Taghvaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th International Conference on Machine Learning</title>
				<meeting>the 37th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-07-13">2020-07-13/2020</date>
			<biblScope unit="volume">119</biblScope>
			<biblScope unit="page" from="6672" to="6681" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Unmasking Clever Hans predictors and assessing what machines really learn</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wäldchen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<idno type="DOI">10.1038/s41467-019-08987-4</idno>
	</analytic>
	<monogr>
		<title level="j">Nature Communications</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">1096</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
