<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Sparse Oblique Decision Trees: A Tool to Understand and Manipulate Neural Net Features ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Suryabhan</forename><surname>Singh Hada</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">work done while at Dept. of Computer Science &amp; Engineering</orgName>
								<orgName type="laboratory">LinkedIn</orgName>
								<orgName type="institution">University of California</orgName>
								<address>
									<settlement>Merced</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Miguel</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
							<email>mcarreira-perpinan@ucmerced.edu</email>
							<affiliation key="aff1">
								<orgName type="department">Dept. of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">University of California</orgName>
								<address>
									<settlement>Merced</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arman</forename><surname>Zharmagambetov</surname></persName>
							<email>azharmagambetov@ucmerced.edu</email>
							<affiliation key="aff1">
								<orgName type="department">Dept. of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">University of California</orgName>
								<address>
									<settlement>Merced</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Sparse Oblique Decision Trees: A Tool to Understand and Manipulate Neural Net Features ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CD1BF5BB7A104B4DF1B6C5B04478AF1C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Decision trees</term>
					<term>Deep neural networks</term>
					<term>Interpretability</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding what internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the net with a decision tree having sparse weight vectors at the nodes. We are able to learn trees that are both highly accurate and interpretable, so they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. We demonstrate this robustly in MNIST and ImageNet with LeNet5 and VGG networks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Deep neural nets are accurate black-box models. They are highly successful in terms of predictive performance but remarkably difficult to understand in terms of how exactly they come up with a prediction. These issues have been known to researchers and practitioners for many years, but it is in the 2010s that deep learning has achieved a wild, unexpected success that has attracted widespread attention beyond computer science. Thus making it urgent to understand the behavior of these models in explanatory terms.</p><p>Much work in this regard seeks to understand what a specific neuron in a deep net does. This includes work on finding input patterns that invert the activation of a neuron <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref> or maximally activate it <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>; and work that finds input patterns, or parts of them (such as image regions) that have an important effect on the output class, essentially a sensitivity analysis via gradients or other measure of saliency <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>Other work seeks to replace a deep net with a simpler, interpretable model that can then be inspected, such as a decision tree, a set of rules or a (sparse) linear model. This can be done locally around an instance <ref type="bibr" target="#b11">[12]</ref> or globally for all instances-which is much harder since an interpretable model like decision trees cannot generally approach the accuracy of the deep net <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20]</ref>. The fundamental problem with these approaches is that traditional decision tree learning algorithms such as CART <ref type="bibr" target="#b20">[21]</ref> or C4.5 <ref type="bibr" target="#b21">[22]</ref> are unable to learn small yet accurate enough trees to be useful mimicks of a neural net except in very small problems.</p><p>Our paper has two contributions that can improve our ability to explain and manipulate trained deep nets. Firstly, we propose decision trees as a tool to understand deep nets. As mentioned above this is by itself not a new idea. What is new is the specific, novel type of tree we use, and how we apply it to a given deep net. Traditional tree learning algorithms typically construct trees where each decision node thresholds a single input feature. Although such trees are considered among the most interpretable models, this is only true if the tree is relatively small. Unfortunately, such trees often produce too low accuracy, and are wholly inadequate for high-dimensional complex inputs such as pixels of an image or neural net features. We capitalize on a recently proposed Tree Alternating Optimization (TAO) algorithm <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24]</ref> which can learn far more accurate trees that remain small and very interpretable because each decision node operates on a small, learnable subset of features. It has been shown to outperform existing tree algorithms such as CART <ref type="bibr" target="#b20">[21]</ref> or C4.5 <ref type="bibr" target="#b21">[22]</ref> by a large margin <ref type="bibr" target="#b24">[25]</ref>, and to improve forest <ref type="bibr" target="#b25">[26,</ref><ref type="bibr" target="#b26">27]</ref>.</p><p>Second, we apply the tree to an internal layer of the deep net, hence mimicking its remaining (classifier) layers, rather than attempting to mimick the entire deep net.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>This allows us to study the relation between deep net features (neuron activations) and output classes (unlike the work cited in the second paragraph, which studies the relation between input features and neuron activations).</head><p>As a subproduct, inspection of the tree allows us to construct a new kind of adversarial attacks where we manipulate the deep net features via a mask to block a specific set of neurons. This gives us surprising control on what class the deep net will output. Among other possibilities, we can make it output the same, desired class for all dataset instances; or make it never output a given class; or make it misclassify certain pairs of classes.</p><p>Next, we describe how we use trees to understand and manipulate deep net features (section 2), and demonstrate this in MNIST and ImageNet <ref type="bibr" target="#b27">[28]</ref> with LeNet5 and VGG16 <ref type="bibr" target="#b28">[29]</ref> deep nets (section 4).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Sparse oblique trees as a tool to observe a deep neural net</head><p>Our overall approach is as follows (see fig. <ref type="figure" target="#fig_2">3</ref> in the appendix). Assume we have a trained deep net classifier y = f (x), where input x ∈ R D and y ∈ R K . We can write f as: f (x) = g(F(x)), where F represents the features-extraction part (z = F(x) ∈ R F ), and g represents the classifier part (y = g(z)). Then:</p><p>1. Train a sparse oblique tree y = T (z) with TAO (see details in <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24]</ref>) on the training set</p><formula xml:id="formula_0">{(F(x n ), y n )} N n=1 ⊂ R F × {1, . . . , K}.</formula><p>Choose the sparsity hyperparameter λ ∈ [0, ∞) such that, T have close to highest validation accuracy and is as sparse as possible.  </p><formula xml:id="formula_1">k = 0 k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = A k = B k = C k = D k = E k = F</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Inspect the tree to find interesting patterns about the deep net.</head><p>Our goal is to achieve a tree that both mimicks well the deep net and is as simple as possible.</p><p>Step 2 is purposely vague. There is probably a wealth of information in the tree regarding the features' meaning and effect on the classification, both at the level of a specific input instance or more globally. Here, we focus on one specific pattern described next.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Manipulating the features of a deep net to alter its classification behavior</head><p>Our overall objective is to control the network prediction by manipulating the value of the deep net features z ∈ R F . We do not alter the network weights, i.e., F and g remain the same. We just alter z into a masked z = µ(z) = µ × ⊙ z + µ + via a multiplicative and an additive mask µ × , µ + ∈ R F , respectively (where "⊙" means elementwise multiplication).</p><p>Original net:</p><formula xml:id="formula_2">y = f (x) = g(F(x))<label>(1)</label></formula><p>Original features:</p><formula xml:id="formula_3">z = F(x)<label>(2)</label></formula><p>Masked net:</p><formula xml:id="formula_4">y = f (x) = g(µ(F(x)))<label>(3)</label></formula><p>Masked features:</p><formula xml:id="formula_5">z = µ(F(x)) = µ(z)<label>(4)</label></formula><p>In the simplest, most intuitive version of the mask, we just need a binary multiplicative mask z = µ × ⊙ z where µ × ∈ {0, 1} F . Using an additive mask and real-valued masks makes the manipulation's effect more robust and harder to detect. We will construct a mask by inspecting the tree, specifically by observing the weight of each feature in each decision node. By selectively zeroing some features we can guarantee that any instance will follow a specific child in a given node and hence direct instances towards a target leaf.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">All instances to one child</head><p>We define decision rule at a decision node i as: "if w T i z + b i ≥ 0 then go to right child, else go to left child", where w i ∈ R F is the weight vector and b i ∈ R is the bias<ref type="foot" target="#foot_0">1</ref> . We also describe a mask for node i, that divert all instances to one child, we call it N M = {µ × , µ + }. N M works as follows. Write w and z as w = (w 0 w − w + ) and z = (z 0 z − z + ), where w 0 = 0, w − &lt; 0 and w + &gt; 0 contain the zero, negative and positive weights in w, and z ≥ 0<ref type="foot" target="#foot_1">2</ref> is arranged according to that. Call S 0 , S − and S + the corresponding sets of indices in w.</p><formula xml:id="formula_6">Then w T z + b = w T − z − + w T + z + with w T − z − ≤ 0 and w T + z + ≥ 0. So if z − = 0 then w T z + b ≥</formula><p>0 and z would go to the right child and if z + = 0 then w T z + b &lt; 0 and z would go to the left child. Hence, N M defined as follows: to go left, µ × ∈ {0, 1} F is a binary vector containing ones at S − , zeros at S + and * (meaning any value) at S 0 ; and µ + ≥ 0 is a vector containing small positive values at S − and zero elsewhere. To go right, exchange "−" and "+" in the procedure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Masks</head><p>We now show how to construct masks that effect a certain class outcome. For each case, we state the desired goal and the corresponding mask. In the manipulations below we may use N M repeatedly over several nodes to construct the mask (which is applied to the feature vector and hence applies globally to each node). In that case, we will only use the multiplicative mask produced by N M at each node, and create the additive mask at the end given the final multiplicative mask. to the parent of each leaf of k and combine the resulting multiplicative masks as extended-AND (defined below). Finally, add the additive mask.</p><formula xml:id="formula_7">A k 1 k 2 : let k 1 = k 2 ∈ {1, . . . ,</formula><p>A k: let k ∈ {1, . . . , K}. Classify all instances x as class k. Mask: find the path from the root to the leaf of class k. At each node i in the path, apply N M (to divert instances along the path) and keep the multiplicative mask only. The final multiplicative mask, elementwise, has a 0 where any of the node masks has a 0, a 1 where all node masks have no 0s but at least one 1, and * elsewhere. This masks out all the "undesired" features that might divert us from the path. Equivalently, this is the logical extended-AND of all the multiplicative masks along the path (where we extend AND to mean AND( * , 0) = 0, AND( * , 1) = 1 and AND( * , * ) = * ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>We have evaluated our masks thoroughly on two deep nets. 1) VGG16 <ref type="bibr" target="#b28">[29]</ref> in a subset of 16 classes of ImageNet <ref type="bibr" target="#b27">[28]</ref>, for which we select the F = 8 192 neurons from its last convolutional layer. 2) LeNet5 in MNIST on 10 digit classes <ref type="bibr" target="#b29">[30]</ref>, for which we select the F = 800 neurons at layer conv2 as features. For both of them, we can train trees that accurately mimick the deep net classifier g. The trees give remarkable insight in the relation of deep net features to classes and allow us to construct masks that indeed work as intended in the deep net for most instances. Here, we focus on VGG16.</p><p>Our VGG16 net achieves an error of 0.2% (training) and 6.79% (test). To train the tree, we use as initial tree a deep enough, complete binary tree with random parameters, and run TAO for a range of increasing λ values. From there, we pick a tree with accuracy close to that of the deep net but as sparse as possible, which we will use as mimick. This tree (λ = 1) has an error of 0% (training) and 7.90% (test); it has 39 nodes and uses just 1 366 features (17% of the total 8 192). We normalize the final tree so each node weight vector has norm 1. We also discuss a tree of somewhat lower accuracy but which has exactly one leaf per class (fig. <ref type="figure">1</ref>). This tree (λ = 33)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Manipulating the deep net features via masks</head><p>We derive masks using the mimick tree (λ = 1). Fig. <ref type="figure" target="#fig_1">2</ref> shows confusion matrices for VGG16, over test instances (see fig. <ref type="figure">5</ref> in the appendix for training instances). As shown in the confusion matrix for deep net vs tree prediction (second matrix in the top left), both models have the same prediction for almost all instances, showing the tree mimick the network really well. This was expected as the tree have training and test errors close to those of VGG16. The interesting confusion matrix is the original network vs network with only the feature selected by the tree (top middle). Here, even after using only 17% of the features, the network has the same prediction as to the original one. It suggests that 83% of the features and hence neurons and weights of the net are practically redundant, or perhaps code for properties that are useful for only a few specific instances. This is not surprising if one notes that deep nets (at least, as presently designed) seem to be vastly overparameterized and can be significantly compressed.</p><p>Generally, the masks affect the deep net classification in the same way as the tree. This is to be expected since the tree has a very similar error and confusion matrix as the net, but it is still surprising in how well it works in most cases, like for classifying all instances as class k mask (bottom row). This also indicates that certain deep net neurons (those critically involved in the masks) play a well-defined role in the classification. The number of features that a mask critically needs to perform its job is very small, around 200 (out of 8 192); for MNIST (see fig. <ref type="figure">6</ref> and 7 in the appendix) it is much smaller, around 40 (out of 800). Misclassifying class k 1 as k 2 (where k 1 must have a single leaf which is a sibling of k 2 ) works well too (top right), although a few instances from other classes are sometimes classified as k 2 . Not classifying any instance as class k (middle row) works also well but fails with some instances, which remain as class k. The confusion matrices for MNIST (not shown) are very similar. We also demonstrate this masking operation on an image which is not in the dataset for the VGG16 network in the appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Inspecting the sparse oblique trees</head><p>Fig. <ref type="figure">1</ref> shows a very interesting tree, obtained for a larger λ value so that there is exactly one leaf per class (the smallest number of leaves possibly unless we ignore classes). This tree has very few nonzero weights yet its test error is reasonable, so it probably extracts features that robustly classify most images. Also, its structure remains unchanged for a wide range of λ. Inspecting it shows an intuitive hierarchy of classes that seem primarily related to the background or surroundings of the main object in the image. Its leftmost subtree {warplane, airliner, school bus, fire engine, sports car} consists of man-made objects often found on roads. However, {container ship, speedboat} (man-made objects found on the sea) appears in the rightmost subtree, together with {killer whale, bald eagle, coral reef}, all of which are also typically found on the sea or on the air. Yet {goldfish} appears in a single subtree quite separate from all other classes: indeed, this fish is found on fishbowls (not the sea) in the training images. A subtree in the middle contains animals in land natural environments (forest, snow, grass, etc.): {tiger cat, white wolf, goose, Siberian husky, lion}. And so on. This is consistent with previous works that have found that, in some specific cases, the reason why a deep net classifies an object as a certain class is caused by the background or more generally by some confounding variables <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b30">31]</ref>. It points to a possible vulnerability of the net, in that it may misclassify an object that happens to appear in an unusual background (say, a bald eagle standing on a road).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>Our paper demonstrates the use of sparse oblique decision trees as a powerful "microscope" to investigate the behavior of deep nets, by learning interpretable yet accurate trees that mimick the classifier part of a deep net. Using the TAO algorithm is critical for this to succeed. The resulting tree gives insights about the relation between neurons and classes, and enables the design of simple manipulations of the neuron activations that can, for any training or test instance, change the class predicted in various, controllable ways (thus making adversarial attacks possible at the level of the deep net features). . For example, for the LeNet5 neural net of <ref type="bibr" target="#b29">[30]</ref> in the diagram, this corresponds to the first 4 layers (convolutional and subsampling) followed by the last 2, fully-connected layers, respectively. The "neural net feature" vector z consists of the activations (outputs) of F neurons, and can be considered as features extracted by the neural net from the original features x (pixel values, for LeNet5). We use a sparse oblique tree to mimic the classifier part y = g(z), by training the tree using as input the neural net features z and as output the corresponding ground-truth labels.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Classifier mimicking</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Tree to mimic LeNet5 features</head><formula xml:id="formula_8">k = 8 k = 9 k =A k =B k =C k =D k =D k =E</formula><formula xml:id="formula_9">k = 8 k = 9 k =A k =B k =C k =D k =D k =E</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Illustration of the masks with an actual image</head><p>Fig. <ref type="figure" target="#fig_7">8</ref> illustrates the mask behavior in an image not in the dataset. The middle column histograms show the deep net features (grouped by class). In each row, the top histogram shows the feature values, and the bottom histogram shows the number of features selected for each class. Next, we show how masking the features drastically alters in a controlled way the softmax output. In row 2, when we apply the A "S " mask, the network now classifies the image as "siberian husky". Similarly, in row 5, when we apply the A " E " mask, the network now classifies the original image as "bald eagle" with large confidence, compared to row 1, where without the mask the softmax value for "bald eagle" is close to zero. We also show how the mask correlates with superpixels (perceptual groups of pixels obtained by oversegmentation) in the image, either manually cropped (row 3) or optimized to invert the desired deep net features (row 4).</p><p>To obtain results like those above, the general procedure is as follows. Firstly (in an offline phase), we train the tree mimic and construct a subset of features S k for each class k, using the A k mask. This defines a score for an input image x as s k (x) = i∈S k F i (x), where F i (x) is the feature i computed by the deep neural net for x. We can then discard the tree and the classifier part of the deep net. All we need is the feature-extraction part of the deep net and the class sets S 1 , . . . , S K .</p><p>Then (in an online phase), given an input image and a target class k, we split the image into superpixels (using some oversegmentation algorithm), compute the score for each superpixel, and report the superpixels with lowest score (most salient).  Column 1 shows the image masks (when available). Column 2 summarizes the 8 192 feature values as two histograms: on the upper panel, the number of features in each class group (listed in the X axis as 0-F, where " * " means features not used by the tree); on the lower panels, the average feature value (neuron activation) per class group. Column 3 shows the histogram of corresponding so max values. Row 1 shows the original image. Row 2 shows a mask in feature space to classify it as "Siberian husky". Row 3 shows a mask manually cropped in the image, whose features resemble those of row 2. Row 4 shows a mask in feature space obtained by finding the top-3 superpixels whose features most resemble those of the masked features of row 2. Row 5 shows a mask in feature space to classify the image as "bald eagle".</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 : 9 0</head><label>19</label><figDesc>Figure 1: Tree having one leaf per class (λ = 33). At each decision node we show its weight vector, node index and bias (always zero). At each leaf we show their index, class label, an image of from their class and class description in the format: class description (class label). We plot the weight vector, of dimension 8 192, as a 91×91 square (the last pixels are unused), with features in the original order in VGG16 (which is determined during training and arbitrary, hence the random aspect of the images), and colored according to their sign and magnitude (positive, negative and zero values are blue, red and white, respectively). ground truth vs features selected . . . . A k 1 k 2 . . . . deep net vs tree by the tree 8 → EE → 8A → BB → A9 → CC → 9</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Confusion matrices for VGG (test set). Top le : ground-truth vs deep net, and deep net vs tree. Top middle: deep net vs deep net with only the features selected by the tree. Top right: A k 1 k 2 (selected examples). Middle: N k. Bottom: A k.The confusion matrices for the training set (not shown) are very similar.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Mimicking part of a neural net with a decision tree. The figure shows the neural net y = f (x) = g(F(x)), considered as the composition of a feature extraction part z = F(x) and a classifier part y = g(z). For example, for the LeNet5 neural net of<ref type="bibr" target="#b29">[30]</ref> in the diagram, this corresponds to the first 4 layers (convolutional and subsampling) followed by the last 2, fully-connected layers, respectively. The "neural net feature" vector z consists of the activations (outputs) of F neurons, and can be considered as features extracted by the neural net from the original features x (pixel values, for LeNet5). We use a sparse oblique tree to mimic the classifier part y = g(z), by training the tree using as input the neural net features z and as output the corresponding ground-truth labels.</figDesc><graphic coords="10,89.28,145.23,416.44,120.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Tree selected as mimic for LeNet5 features (λ = 20).At each decision node we show weight vector and average of training instances at each leaf; we show the node index, bias (always zero) and, for leaves, their label. We plot the weight vector, of dimension 800, as a 29×29 square (the last pixels are unused), with features in the original order in LeNet5 (which is determined during training and arbitrary, hence the random aspect of the images), and colored according to their sign and magnitude (positive, negative and zero values are blue, red and white, respectively). You may need to zoom in the plot.For MNIST, our LeNet5 architecture achieves an error of 0.00545% (training) and 0.61% (test). We selected as mimick the tree for λ = 20, with depth 5 and only 27 nodes. It has an error of 1.28% (training) and 1.67% (test), which is very close to that of LeNet5.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>. . . . . . . . . . . . . . . . . . . . . . A k . . . . . . . . . . . . . . . . . . . . . . . . .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :Figure 6 :Figure 7 :</head><label>567</label><figDesc>Figure 5: Like fig. 2 but for the training set.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8:Illustration of masks for a particular image (VGG16 network on ImageNet subset). Column 1 shows the image masks (when available). Column 2 summarizes the 8 192 feature values as two histograms: on the upper panel, the number of features in each class group (listed in the X axis as 0-F, where " * " means features not used by the tree); on the lower panels, the average feature value (neuron activation) per class group. Column 3 shows the histogram of corresponding so max values. Row 1 shows the original image. Row 2 shows a mask in feature space to classify it as "Siberian husky". Row 3 shows a mask manually cropped in the image, whose features resemble those of row 2. Row 4 shows a mask in feature space obtained by finding the top-3 superpixels whose features most resemble those of the masked features of row 2. Row 5 shows a mask in feature space to classify the image as "bald eagle".</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>K}. For any instance originally classified as k 1 , classify it as k 2 . For any other instance, do not alter its classification. This case only works if the classes k 1 and k 2 are leaf siblings (have the same parent). Class k 2 may be represented by multiple leaves since we only need to deal with one of them (the sibling of k 1 ). Mask: simply apply N M to the parent of the leaves of k 1 and k 2 . For instance, if class k 1 is left child, then final multiplicative mask µ × will contain ones at S + , zeros at S − and * (meaning any value) at S 0 .</figDesc><table /><note>N k: let k ∈ {1, . . . , K}. For any instance originally classified as k, classify it as any other class. For any other instance, do not alter its classification. Mask: simply apply N M</note></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The bias (bi) at each decision node i of the tree is zero. This holds very well in the trees we trained, specifically |bi| ≪ wi at each decision node.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">We assume the deep net features are nonnegative: z = F(x) ≥ 0. This is true for ReLUs, which are used in most deep nets at present.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sparse oblique decision trees: A tool to understand and manipulate neural net features</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Hada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zharmagambetov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Mining and Knowledge Discovery</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Visualizing and understanding convolutional networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 13th European Conf. Computer Vision (ECCV&apos;14)</title>
				<meeting>13th European Conf. Computer Vision (ECCV&apos;14)<address><addrLine>Zürich, Switzerland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="818" to="833" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Inverting visual representations with convolutional networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brox</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2016 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR&apos;16)</title>
				<meeting>of the 2016 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR&apos;16)<address><addrLine>Las Vegas, NV</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Visualizing deep convolutional neural networks using natural pre-images</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mahendran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Computer Vision</title>
		<imprint>
			<biblScope unit="volume">120</biblScope>
			<biblScope unit="page" from="233" to="255" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Visualizing Higher-Layer Features of a Deep Network</title>
		<author>
			<persName><forename type="first">D</forename><surname>Erhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<idno>1341</idno>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
		<respStmt>
			<orgName>Université de Montréal</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Deep inside convolutional networks: Visualising image classification models and saliency maps</title>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2nd Int. Conf. Learning Representations (ICLR 2014)</title>
				<meeting>of the 2nd Int. Conf. Learning Representations (ICLR 2014)<address><addrLine>Banff, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Synthesizing the preferred inputs for neurons in neural networks via deep generator networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yosinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clune</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems (NIPS)</title>
				<editor>
			<persName><forename type="first">D</forename><forename type="middle">D</forename><surname>Lee</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Sugiyama</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<meeting><address><addrLine>Cambridge, MA</addrLine></address></meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="3387" to="3395" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Explaining nonlinear classification decisions with deep Taylor decomposition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="page" from="211" to="222" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Net2Vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Fong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2018 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR&apos;18)</title>
				<meeting>of the 2018 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR&apos;18)<address><addrLine>Salt Lake City, UT</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="8730" to="8738" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Grad-CAM: Visual explanations from deep networks via gradient-based localization</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Selvaraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cogswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vedantam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Parikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Batra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 16th Int. Conf. Computer Vision (ICCV&apos;17)</title>
				<meeting>16th Int. Conf. Computer Vision (ICCV&apos;17)<address><addrLine>Venice, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="618" to="626" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Learning important features through propagating activation differences</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shrikumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Greenside</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kundaje</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 34th Int. Conf. Machine Learning (ICML 2017)</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Precup</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><forename type="middle">W</forename><surname>Teh</surname></persName>
		</editor>
		<meeting>of the 34th Int. Conf. Machine Learning (ICML 2017)<address><addrLine>Sydney, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="3145" to="3153" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Why should I trust you?&quot;: Explaining the predictions of any classifier</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (SIGKDD 2016)</title>
				<meeting>of the 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (SIGKDD 2016)<address><addrLine>San Francisco, CA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1135" to="1144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Survey and critique of techniques for extracting rules from trained artificial neural networks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Andrews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Diederich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B</forename><surname>Tickle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="373" to="389" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">IBM SPSS Modeler Cookbook</title>
		<author>
			<persName><forename type="first">K</forename><surname>Mccormick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Abbott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Khabaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Mutchler</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>Packt Publishing</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A survey of methods for explaining black box models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Guidotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Monreale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruggieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Turini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Giannotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pedreschi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page">93</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Rule generation from neural networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Fu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Systems, Man, and Cybernetics</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1114" to="1124" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Extracting refined rules from knowledge-based neural networks</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">G</forename><surname>Towell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Shavlik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="71" to="101" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Using sampling and queries to extract rules from trained neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Craven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Shavlik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 11th Int. Conf. Machine Learning (ICML&apos;94)</title>
				<meeting>of the 11th Int. Conf. Machine Learning (ICML&apos;94)</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="37" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Extracting tree-structured representations of trained networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Craven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Shavlik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems (NIPS)</title>
				<editor>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Touretzky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Mozer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Hasselmo</surname></persName>
		</editor>
		<meeting><address><addrLine>Cambridge, MA</addrLine></address></meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="24" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Knowledge discovery via multiple models</title>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intelligent Data Analysis</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="187" to="202" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Classification and Regression Trees</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Breiman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Olshen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Stone</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1984">1984</date>
			<pubPlace>Wadsworth, Belmont, Calif</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Programs for Machine Learning</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1993">1993</date>
			<publisher>Morgan Kaufmann</publisher>
			<biblScope unit="volume">4</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Alternating optimization of decision trees, with application to learning sparse oblique trees</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tavallali</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems (NEURIPS)</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Grauman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Cesa-Bianchi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<meeting><address><addrLine>Cambridge, MA</addrLine></address></meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="1211" to="1221" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">The Tree Alternating Optimization (TAO) algorithm: A new way to learn decision trees and tree-based models</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">An experimental comparison of old and new decision tree algorithms</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zharmagambetov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Hada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gabidolla</surname></persName>
		</author>
		<idno>ArXiv:1911.03054</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Ensembles of bagged TAO trees consistently improve over random forests, AdaBoost and gradient boosting</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zharmagambetov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2020 ACM-IMS Foundations of Data Science Conference (FODS 2020)</title>
				<meeting>of the 2020 ACM-IMS Foundations of Data Science Conference (FODS 2020)<address><addrLine>Seattle, WA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="35" to="46" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Smaller, more accurate regression forests using tree alternating optimization</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zharmagambetov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Carreira-Perpiñán</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 37th Int. Conf. Machine Learning (ICML 2020)</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Daumé</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Iii</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</editor>
		<meeting>of the 37th Int. Conf. Machine Learning (ICML 2020)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="11398" to="11408" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">ImageNet: A large-scale hierarchical image database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2009 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR&apos;09)</title>
				<meeting>of the 2009 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR&apos;09)<address><addrLine>Miami, FL</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="248" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Very deep convolutional networks for large-scale image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 3rd Int. Conf. Learning Representations (ICLR 2015)</title>
				<meeting>of the 3rd Int. Conf. Learning Representations (ICLR 2015)<address><addrLine>San Diego, CA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Gradient-based learning applied to document recognition</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Haffner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE</title>
				<meeting>IEEE</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">86</biblScope>
			<biblScope unit="page" from="2278" to="2324" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Zech</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Badgeley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B</forename><surname>Costa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Titano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">K</forename><surname>Oermann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS Medicine</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">e1002683</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
