<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Second Glance: A Novel Explainable AI to Understand Feature Interactions in Neural Networks using Higher-Order Partial Derivatives</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Zohaib</forename><surname>Shahid</surname></persName>
							<email>z.shahid@lboro.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">Institute for Digital Technologies</orgName>
								<orgName type="institution">Loughborough University London</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yogachandran</forename><surname>Rahulamathavan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute for Digital Technologies</orgName>
								<orgName type="institution">Loughborough University London</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Safak</forename><surname>Dogan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute for Digital Technologies</orgName>
								<orgName type="institution">Loughborough University London</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Second Glance: A Novel Explainable AI to Understand Feature Interactions in Neural Networks using Higher-Order Partial Derivatives</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">4F0200384EE20F72FF3D00B9990C6C62</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Feature Interactions</term>
					<term>Higher-Order Sensitivity Analysis</term>
					<term>Interpretable AI</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Neural networks often operate as "black boxes, " making understanding how they arrive at their decisions difficult. To build trust and improve neural networks, it is essential to identify the most salient inputs and how they interact within the network. We present "Second Glance, " a novel approach for performing second-order sensitivity analysis on neural networks with Rectified Linear Unit (ReLU) activations to address this. The first-order sensitivity analysis quantifies the individual influence of the input features on the model output. However, it fails to capture how features interact, potentially leading to misleading conclusions. Second-order sensitivity analysis, using second-order partial derivatives, can reveal these interactions, providing a more comprehensive understanding of the model's inner workings. Unfortunately, ReLU activation, a popular choice because of its efficiency, introduces zero second-order partial derivatives. To overcome this limitation, Second Glance employs a two-stage strategy. First, it trains a primary neural network with ReLU activations. Then, it trains a separate "surrogate" model using the concerned features as the input and the first-order partial derivatives obtained from the primary model as its output. In this paper, we show that the subtle second-order sensitivity analysis of the original neural network with ReLU activation function can be effectively obtained by analyzing the first-order partial derivatives of the surrogate model. We further validate the proposed method by experimenting with popular UCI bank marketing and UCI adult income datasets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the context of explainable AI (XAI), sensitivity analysis is the quantification and evaluation of the sensitivity of the output of a machine learning model to changes in its input features. Concerning sensitivity analysis, the focus of this research is on neural networks. Sensitivity analysis in neural networks involves assessing the impact of input variations on the neural network's predictions. First-order sensitivity analysis is the technique whereby the impact of a single input on the output is measured. One can also think of it as measuring a linear change in the output concerning an input. Second-order sensitivity analysis is done to understand how different inputs affect or interact. This type of analysis is concerned with measuring the nonlinear changes in an output concerning a number of inputs.</p><p>It is understood that a deeper understanding of the behaviour of the neural networks can be achieved by quantifying how features interact to affect predictions <ref type="bibr" target="#b0">[1]</ref>. There are many ways to measure the interaction of features like Shap-iq (Computation of Shapley interactions for arbitrary cardinal interaction indices by using a sampling-based approximator) <ref type="bibr" target="#b1">[2]</ref> and analyzing the directed graph made by bivariate methods <ref type="bibr" target="#b2">[3]</ref>. This research focuses on feature interactions in neural networks based on partial derivatives like the usage of rule ensembles <ref type="bibr" target="#b3">[4]</ref>, analyzing interactions in non-linear models <ref type="bibr" target="#b4">[5]</ref> and Integrated Hessians <ref type="bibr" target="#b5">[6]</ref>. Concerning a function and a point, the Interaction Effect between a concerned set of features denotes the partial derivative of the function output with respect to the features. The partial derivatives show the small changes in the function caused by the change in each chosen feature. Our research is around pair-wise interactions or second-order partial derivatives, which constitute the elements of the Hessian matrix.</p><p>Neural networks, based on ReLU activation functions, have valuable properties like mitigating the vanishing gradient problem <ref type="bibr" target="#b6">[7]</ref>. Concerning feature interactions, the issue is that these ReLU networks are piece-wise linear. Therefore, they generate a zero Hessian almost everywhere, and studying the feature interactions in such networks is impossible. The proposed approach, Second Glance, as shown in Figure <ref type="figure" target="#fig_0">1</ref>, mitigates this issue by taking the first-order partial derivatives of the concerned ReLU-based neural network (primary model M1) and training a surrogate model (M2). Table <ref type="table" target="#tab_0">1</ref> explains what pair-wise feature interactions mean according to their sign (direction) using one input as an example and vice versa. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Explanation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>+𝑣𝑒 +𝑣𝑒</head><p>As the 1st order partial derivative of 𝑥 1 is positive, this means that when 𝑥 1 increases, the output of the neural network increases. As the 2nd order partial derivative is also positive, the rate at which 𝑥 2 changes, increases. In short, the impact of 𝑥 2 on the output is amplified by 𝑥 1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>+𝑣𝑒 −𝑣𝑒</head><p>The output will increase, as the 1st order partial derivative is positive. As the 2nd order partial derivative is negative, the rate of change of 𝑥 2 decreases with the increase in 𝑥 1 . In short, the influence of 𝑥 2 on the output is dampened by 𝑥 1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>−𝑣𝑒 +𝑣𝑒</head><p>The output will decrease due to the negative value of the 1st order partial derivative of 𝑥 1 (or when 𝑥 1 is increased). As the rate of change of 𝑥 2 due to the rate of change of 𝑥 1 is positive, this means that the rate of change of 𝑥 2 increases. In summary, the absence of 𝑥 1 magnifies the influence of 𝑥 2 on the output.</p><p>−𝑣𝑒 −𝑣𝑒 The negative sign of the 1st order partial derivative of 𝑥 1 indicates an inversely proportional relationship between itself and the output. The 2nd order partial derivative being negative shows that the effect of 𝑥 2 on the output decreases as 𝑥 1 becomes more negative.</p><p>Section 2 gives a brief literature review of gradient-based sensitivity analysis. Section 3 explains the functioning of Second Glance. Section 4 will show some experiments on 2 popular UCI datasets using Second Glance and how it can lead to another way of estimating feature interactions where zero Hessians are an issue. Overall, Second Glance aims to provide a more granular analysis of feature interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Literature Review</head><p>The gradient-based sensitivity analysis methods will be focused on as they are more relevant to this research. Given a sample, the gradient-based methods use the natural interpretation of the gradient as the infinitesimally local importance. A well-known approach is the saliency map <ref type="bibr" target="#b7">[8]</ref>, which is simply the gradient of model output with respect to the input. SmoothGrad <ref type="bibr" target="#b8">[9]</ref> mitigated the noise in saliency maps by averaging them and came up with sample complexity guarantees. Research related to Grad-CAM <ref type="bibr" target="#b9">[10]</ref> is gradient-based with the main distinction that the importance is calculated over hidden (internal) layers. The calculation of the Jacobian matrix or the matrix of first-order partial derivatives has been thoroughly discussed by <ref type="bibr" target="#b10">[11]</ref>.</p><p>Higher-order interactions are estimated using gradient-based approaches like Gradient-NID <ref type="bibr" target="#b11">[12]</ref>, which estimates the corresponding Hessian element squared as the strength of feature interaction. By extending Integrated Gradients to utilize a path-integrated Hessian, <ref type="bibr" target="#b5">[6]</ref> came up with Integrated Hessian. SmoothHess by <ref type="bibr" target="#b0">[1]</ref> convolves the Hessian Matrix of a ReLU network with a Gaussian to mitigate the issue of zero Hessians.</p><p>Though these methods handle ReLU networks in their way, like the replacement of ReLU function with SoftPlus post-hoc before applying Integrated Hessians <ref type="bibr" target="#b5">[6]</ref>, similar usage of SoftPlus activation by <ref type="bibr" target="#b11">[12]</ref> and the usage of Stein's Lemma by <ref type="bibr" target="#b0">[1]</ref>, there are not many methods that use surrogate models for second-order sensitivity analysis, specifically. <ref type="bibr" target="#b12">[13]</ref> uses AIsurrogate models to estimate the relationships between input features and ventricular parameters for medical applications; it does not focus specifically on second-order sensitivity analysis. The commendable work by <ref type="bibr" target="#b13">[14]</ref> uses surrogate models for point cloud deep neural networks based on LIME (Local Interpretable Model-Agnostic Explanations). The use of generalized additive models (surrogate models) with pairwise interactions (GA2M) has been explored to understand the trade-off between accuracy and interpretability in machine learning techniques applied to clinical data <ref type="bibr" target="#b14">[15]</ref> but it does not focus on using partial derivatives. In contrast, Second Glance targets global explainability by generating second-order partial derivatives of the primary model using the surrogate model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed Algorithm</head><p>In the two-stage process of Second Glance (Figure <ref type="figure" target="#fig_0">1</ref>), the primary neural network or model (M1) is trained and its first-order partial derivatives are obtained. These are put together with the inputs as a dataset to train the surrogate model or neural network (M2). The surrogate model (M2) is the main contribution, where the inputs are the features of the primary model, and the outputs are the first-order partial derivatives from the primary model. The second-order partial derivatives or the Hessian of the primary model can be obtained by calculating the first-order partial derivatives of the surrogate model. If 𝑀 1 takes an input, 𝑥 and produces an output, 𝑦, its first-order partial derivative will be 𝑀 ′ 1 (𝑥), as shown in <ref type="bibr" target="#b0">(1)</ref>. The first-order partial derivative of (𝑀 1 ) will be used as the output for the surrogate model, 𝑀 2 , which takes an input of 𝑥. As shown in (2), the first-order partial derivative of 𝑀 2 will indeed be equal to the second-order partial derivative of 𝑀 1 (represented by 𝜕 2 𝑦 𝜕𝑥 2 ). In other words, this happened because the first-order partial derivatives (from M1) are backpropagated to the inputs, in M2 to get the second-order partial derivatives.</p><formula xml:id="formula_0">𝑦 = 𝑀 1 (𝑥) ; 𝜕𝑦 𝜕𝑥 = 𝑀 ′ 1 (𝑥)<label>(1)</label></formula><formula xml:id="formula_1">𝜕𝑦 𝜕𝑥 = 𝑀 2 (𝑥) ; 𝜕 2 𝑦 𝜕𝑥 2 = 𝑀 ′ 2 (𝑥)<label>(2)</label></formula><p>Following this approach, we can obtain higher-order partial derivatives i.e., to obtain 3rd-order partial derivatives, we can train a third model (M3) using 𝑥 as inputs but using the first-order partial derivatives of the 𝑀 2 as outputs. The first-order partial derivatives of M3 would be the 3rd order partial derivatives of M1. The 3rd order of partial derivatives identifies how a change in two features impacts the change in the third feature on the prediction. In this preliminary study, we focus only on the 2nd order sensitivity analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments with Second Glance</head><p>To test Second Glance, the UCI bank marketing <ref type="bibr" target="#b15">[16]</ref> and UCI adult income <ref type="bibr" target="#b16">[17]</ref> datasets were used, which are for classification problems. They were selected as they are well-known bench-mark tabular datasets used for testing neural networks. The most influential 5 features from each dataset were selected using SHAP to make it easy to understand and present the functioning of Second Glance. However, the proposed approach can support an arbitrary number of features. The UCI bank marketing dataset contains data for marketing campaigns based on phone calls, and the target was to assess whether a client would subscribe to a term deposit (yes or 𝑦 = 1) or not (no or 𝑦 = 0). This dataset has a total of 41,188 instances and 19 multivariate features. The UCI adult income dataset, which aims to predict whether a person will make over $50K per year or not, is a multivariate dataset with 30,162 instances (cleaned dataset) and 14 features.</p><p>For simplicity, we kept the same architecture for 𝑀 1 for both of these datasets as follows: 5 inputs, 3 hidden layers with 4 neurons each (ReLU activation is used in the hidden nodes), and 1 output neuron (Sigmoid activation) to ensure uniformity. Binary crossentropy was used as the loss. The hidden layers carry ReLU activation because the surrogate model (from Second Glance) will be created to analyze and mitigate the effect of zero Hessians due to ReLU activations. The selected 5 features and performance metrics of 𝑀 1 and 𝑀 2 are in Table <ref type="table" target="#tab_1">2</ref>.</p><p>It can be seen that the primary neural network trained on the UCI bank marketing gives high values of accuracy, recall, and F1 score. The performance metrics of M1 for the UCI adult income dataset are decent. The explainable AI model, made from any model, only gives accurate explanations as long as the performance of the original model is high, so it is essential to ensure that. As M2 had continuous values (first-order partial derivatives from M1) as the output, the R-squared score was used as the performance metric. Table <ref type="table" target="#tab_2">3</ref> shows some of the first-order partial derivatives obtained from the primary neural network trained on the adult income dataset. As there were 5 inputs, the number of partial derivatives per row is also 5. The range of the partial derivatives is between -1 and 1. As discussed in Table <ref type="table" target="#tab_0">1</ref>, the positive values mean that the output of the model increased with the change in the feature. Meanwhile, the negative values depict an inverse relationship between the input and the output.</p><p>The surrogate models (M2) were trained for both datasets with different architectures. It is emphasized that the surrogate model can have any architecture. The given architecture was picked to get the best possible performance. Each M2 had 5 input features and the relevant first-order partial derivatives of M1 as the outputs (5 output neurons with 𝑡𝑎𝑛ℎ as the activation function to place the continuous values within a suitable range). The number of instances of input features (along with the choice of inputs) was the same as M1 for M2 in each case. Concerning M2 for the bank marketing dataset, there were 3 hidden layers, with ReLU activation used in the first 2 layers and sigmoid activation in the last hidden layer. Concerning the adult income dataset, the surrogate model had 5 hidden layers. ReLU was used in the first 2 layers. The 3rd and 4th layers had GeLU (Gaussian error Linear Unit) activation. Sigmoid was used in the last hidden layer. The loss used in both cases was Mean Squared Error. The Rsquared score for the trained surrogate (M2) models in Table <ref type="table" target="#tab_1">2</ref> shows that the models performed modestly. The first-order partial derivatives of all M2 models were obtained by using (2) and accordingly, the first-order partial derivatives (or the Jacobian matrix) of M2 indeed represent the Hessian matrices of 𝑀 1 in each case.   SHAP interactions were calculated from XGBoost Classifiers trained for both datasets to validate the Hessian matrices generated by Second Glance. XGBoost Classifiers were used because the present version of the SHAP Python library can only calculate interactions for XGBoost models <ref type="bibr" target="#b17">[18]</ref>. Although SHAP interactions and Second Glance are very different in implementation, feature interactions exist regardless of the model used, as long as the relationship between the features influences the outcome <ref type="bibr" target="#b18">[19]</ref>. In Figure <ref type="figure" target="#fig_2">3</ref>, each value represents the measure of interaction (second-order partial derivative) between the features. The black squares represent highly negative values, while the white represent highly positive values. The grey represent the rest of the values that lie between them. The negative and positive values (polarity) play a significant role in interpreting the results. The SHAP interactions and the Hessian matrices were calculated for all the instances of both datasets for a better understanding.</p><p>Upon comparing the polarities, it was found that for the bank marketing dataset, 45.4% of the polarities were the same in the SHAP interactions and the Hessian matrices. This amount was 50.4% in the case of the adult income dataset. This validates the correctness of the proposed approach. The reason is that if the proposed approach's outputs are random then the probability of matching 50% of the polarities between both the approaches would be around 1 2 12.5 ≈ 0.01%. For the selected features for a single datapoint of the bank marketing dataset, the Hessian matrix (Figure <ref type="figure" target="#fig_2">3</ref>) has been compared with the SHAP interactions (Figure <ref type="figure" target="#fig_1">2</ref>). As shown in Figure <ref type="figure" target="#fig_3">4</ref>, nearly 40% of the total feature pairs have similar polarities. SHAP interactions show the absolute impact on the output due to the interactions, while Second Glance shows the increase or decrease in the rate of change of an output with respect to the interaction between features (as explained generally in Table <ref type="table" target="#tab_0">1</ref>). For example, in terms of the interaction of the emp.var.rate with itself, the effect on the output is positive. The relevant SHAP interaction (Figure <ref type="figure" target="#fig_1">2</ref>) shows that the probability of a client subscribing to a term deposit increased by 2.06% while Figure <ref type="figure" target="#fig_2">3</ref> shows that this interaction amplifies the influence of emp.var.rate on the output. In terms of the interaction between previous and euribor3m, both heatmaps carry a negative value near zero. This confirms that there is no or less effect on the output of this interaction. The value of −3.4367 corresponding to contact and euribor3m (Figure <ref type="figure" target="#fig_2">3</ref>) means that the influence of contact is lessened or dampened by euribor3m or vice versa. As the influence of one of the features is being dampened, the corresponding SHAP interaction shows that there is indeed a negative impact on the output, but not a lot (overall output not much affected). In short, the probability of a subscription by a client decreased but not significantly.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: High-level view of the proposed Second Glance algorithm</figDesc><graphic coords="4,89.29,122.08,416.69,242.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Feature Interaction Based on SHAP.</figDesc><graphic coords="6,113.43,372.24,105.80,105.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Feature Interaction based on Second Glance.</figDesc><graphic coords="6,255.64,372.24,105.80,105.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Matching Polarities.</figDesc><graphic coords="6,397.86,372.24,105.81,106.86" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Explaining the meaning of feature interactions (1st order and 2nd order partial derivatives) in neural</figDesc><table><row><cell>networks</cell><cell></cell></row><row><cell>𝜕𝑦 𝜕𝑥 1</cell><cell>𝜕 𝜕𝑥 1 ( 𝜕𝑦 𝜕𝑥 2 )</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Table showing the performance metrics for both M1 and M2</figDesc><table><row><cell>Dataset</cell><cell cols="2">Selected 5 features Accuracy of</cell><cell>Recall</cell><cell>F1</cell><cell>R-</cell></row><row><cell></cell><cell></cell><cell>M1</cell><cell>of M1</cell><cell>score</cell><cell>squared</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>of M1</cell><cell>score</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>of M2</cell></row><row><cell>UCI bank</cell><cell>emp.var.rate,</cell><cell>88.9%</cell><cell>98%</cell><cell>0.94</cell><cell>0.914</cell></row><row><cell>marketing</cell><cell>euribor3m,</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>dataset</cell><cell>cons.price.idx,</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>contact,</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>previous</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>UCI adult</cell><cell>age, workclass,</cell><cell>73.1%</cell><cell>77.5%</cell><cell>0.59</cell><cell>0.826</cell></row><row><cell>income</cell><cell>education_num,</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>dataset</cell><cell>marital_status,</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>hours_per_week</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Table showing some of the first-order partial derivatives of M1 for the UCI adult income dataset</figDesc><table><row><cell>age</cell><cell>workclass</cell><cell>education_num</cell><cell>marital_status</cell><cell>hours_per_week</cell></row><row><cell>0.939</cell><cell>-0.482</cell><cell>1.00</cell><cell>-1.00</cell><cell>0.998</cell></row><row><cell>1.00</cell><cell>-1.00</cell><cell>-0.476</cell><cell>-0.949</cell><cell>-0.300</cell></row><row><cell>0.195</cell><cell>-0.496</cell><cell>1.00</cell><cell>-1.00</cell><cell>0.387</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and Future Works</head><p>The proposed method, Second Glance, provides a unique post-hoc way to generate Hessians for ReLU-based neural networks. It opens up another research direction where surrogate models and more granularity can be considered while aiming to generate non-zero Hessians from ReLUbased neural networks. We have done some preliminary experiments with the tabular UCI bank marketing and UCI adult income datasets and interpreted what the result (Hessian), produced by Second Glance, shows and validated the results with the SHAP feature interactions. Our research aims to expand Second Glance's capabilities to encompass image datasets. As a future research direction, we will conduct a rigorous comparison against contemporary gradient-based second-order sensitivity analysis algorithms, scrutinizing metrics such as the frequency of zeros in the Hessian and symmetry while prioritizing enhancements in efficiency.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Smoothhess: Relu network feature interactions via stein&apos;s lemma</title>
		<author>
			<persName><forename type="first">M</forename><surname>Torop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Masoomi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ioannidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Shap-iq: Unified approximation of any-order shapley interactions</title>
		<author>
			<persName><forename type="first">F</forename><surname>Fumagalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Muschalik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kolpaczki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hüllermeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hammer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Masoomi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">P</forename><surname>Hersh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">K</forename><surname>Silverman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Castaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ioannidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.07670</idno>
		<title level="m">Explanations of black-box models based on directional feature interactions</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Popescu</surname></persName>
		</author>
		<title level="m">Predictive learning via rule ensembles</title>
				<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Interaction terms in logit and probit models</title>
		<author>
			<persName><forename type="first">C</forename><surname>Ai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Norton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Economics letters</title>
		<imprint>
			<biblScope unit="volume">80</biblScope>
			<biblScope unit="page" from="123" to="129" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Explaining explanations: Axiomatic feature interactions for deep networks</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Janizek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sturmfels</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="1" to="54" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Deep sparse rectifier neural networks</title>
		<author>
			<persName><forename type="first">X</forename><surname>Glorot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings</title>
				<meeting>the fourteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="315" to="323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.1556</idno>
		<title level="m">Very deep convolutional networks for large-scale image recognition</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Smilkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Thorat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Viégas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wattenberg</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.03825</idno>
		<title level="m">Smoothgrad: removing noise by adding noise</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Grad-cam: Visual explanations from deep networks via gradient-based localization</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Selvaraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cogswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vedantam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Parikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Batra</surname></persName>
		</author>
		<idno type="DOI">10.1109/iccv.2017.74</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Computer Vision (ICCV)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Pizarroso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Portela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Muñoz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2002.11423</idno>
		<title level="m">Neuralsens: sensitivity analysis of neural networks</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Tsang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.10966</idno>
		<title level="m">Feature interaction interpretability: A case for explaining ad-recommendation systems via neural interaction detection</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Efficient ventricular parameter estimation using ai-surrogate models</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D M</forename><surname>Talou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">P B</forename><surname>Gamage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Nash</surname></persName>
		</author>
		<idno type="DOI">10.3389/fphys.2021.732351</idno>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Physiology</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Surrogate model-based explainability methods for point cloud nns</title>
		<author>
			<persName><forename type="first">H</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kotthaus</surname></persName>
		</author>
		<idno type="DOI">10.48550/arxiv.2107.13459</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Interpretable machine learning in healthcare through generalized additive model with pairwise interactions (ga2m): Predicting severe retinopathy of prematurity</title>
		<author>
			<persName><forename type="first">T</forename><surname>Karatekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sancak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Celik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Topcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Karatekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kirci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Okatan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2019 international conference on deep learning and machine learning in emerging applications (deep-ML)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="61" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Bank Marketing</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Moro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cortez</surname></persName>
		</author>
		<idno type="DOI">10.24432/C5K306</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/10.24432/C5K306" />
	</analytic>
	<monogr>
		<title level="m">UCI Machine Learning Repository</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Adult, UCI Machine Learning Repository</title>
		<author>
			<persName><forename type="first">B</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<idno type="DOI">10.24432/C5XW20</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/10.24432/C5XW20" />
		<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">From local explanations to global understanding with explainable ai for trees</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Erion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Degrave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Prutkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Nair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Katz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Himmelfarb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="2522" to="5839" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Analyzing attribute dependencies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jakulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bratko</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-540-39804-2_22</idno>
	</analytic>
	<monogr>
		<title level="m">Knowledge Discovery in Databases</title>
				<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">2003</biblScope>
			<biblScope unit="page" from="229" to="240" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
