<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Artificial Representative Trees as Interpretable Surrogates for Random Forests</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Lea</forename><forename type="middle">Louisa</forename><surname>Kronziel</surname></persName>
							<email>l.kronziel@uni-luebeck.de</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Medical Biometry and Statistics</orgName>
								<orgName type="institution" key="instit1">University of Luebeck</orgName>
								<orgName type="institution" key="instit2">University Hospital Schleswig-Holstein -Campus Luebeck</orgName>
								<address>
									<addrLine>Ratzeburger Allee 160</addrLine>
									<postCode>V24, 23562</postCode>
									<settlement>Lübeck</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Artificial Representative Trees as Interpretable Surrogates for Random Forests</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">13F383DC406C2236209DEA23BFAEF2D9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Random Forest</term>
					<term>Surrogate Model</term>
					<term>Machine Learning</term>
					<term>Interpretability</term>
					<term>Most Representative Tree</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Random forests (RFs) are a popular machine learning method with good prediction performance, but difficult to interpret due to their structure as an ensemble method. To interpret RFs, decision trees can be used as surrogate models to preserve the tree structure. Alternatively, a tree of the RF can be selected as a surrogate model, whereby a direct part of the RF is used and not a similar model. The most representative tree (MRT) of the RF is used for this, which is most similar to all other trees of the RF. However, MRTs have the potential for misinterpretation due to non-informative early splits. To overcome this, the research in my PhD thesis will focus on generating an algorithm for artificial representative trees (ARTs) and comparing them with MRTs and decision trees as surrogate models using simulation studies as well as benchmark data. The first results show a promising improvement in terms of predictive quality and interpretability when comparing ARTs and MRTs.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Random forests (RFs) are a well-known and efficient machine learning (ML) algorithm for creating predictive models, especially for tabular data <ref type="bibr" target="#b0">[1]</ref>. For example, RFs can be used to analyze high-dimensional molecular or genetic data <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref> as well as to enable individualized treatment options for patients in the context of precision medicine <ref type="bibr" target="#b3">[4]</ref>. They consist of an ensemble of decision trees <ref type="bibr" target="#b4">[5]</ref>, whereby the decisions of a single tree are understandable. However, it is difficult for a human to understand the decisions of the RF in detail, which is why RFs are often referred to as black box models. Despite good prediction performance, this can be a barrier to the use of these methods in practice <ref type="bibr" target="#b5">[6]</ref>.</p><p>There are various approaches to make the decisions of such complex models understandable to enable an interpretation of the model. This includes understanding the individual predictions and which variables influence these predictions. For example, post hoc approaches such as partial dependence plots can be used to determine which variables influence the prediction of a model. In addition, variable importance can measure how important the variables are for the prediction performance <ref type="bibr" target="#b6">[7]</ref>. Alternatively a surrogate model that is easier to interpret such as a decision tree can be developed instead. To ensure that the surrogate model is as similar as possible to the original model, it is usually trained to make the same predictions as the black box model as it was done in <ref type="bibr" target="#b7">[8]</ref>. However, there are also other approaches for using decision trees as surrogates. For example, a locally adapted decision tree is created in <ref type="bibr" target="#b8">[9]</ref>, which can be used to explain the prediction of a specific observation.</p><p>If a RF is to be interpreted, a decision tree as a surrogate model has the advantage that the tree structure is preserved. However, surrogate models that are trained for high predictive similarity cannot be guaranteed to actually use the same decisions as the original model. Instead of training a surrogate model it is suggested to select one decision tree from the ensemble of the RF to be interpreted as a surrogate model <ref type="bibr" target="#b9">[10]</ref>. The decision tree that represents the RF best should be used and is therefore referred to as the most representative tree (MRT). This method has the particular advantage that the structure of the surrogate model and its decisions correspond directly to a part of the RF and are not just similar to it. MRTs therefore combine the predictive performance and interpretability of decision trees with the stability of RF. They also offer the advantage that they can be used more easily for external validation. A single MRT can be printed in a publication, while an RF can only be made available as an object of the programming language used or via a website interface.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Key related work</head><p>The idea of using a decision tree of the RF ensemble as a surrogate model was first reported in <ref type="bibr" target="#b9">[10]</ref>. In their approach, the MRT is selected as the tree that is most similar on average to the other trees in the ensemble. Three distance metrics are proposed to calculate the pairwise similarities between the decision trees. For example, the difference in the predictions for a test data set is calculated for each pair of trees. Alternatively, the similarity can be measured by whether the observations of a test data set are assigned to the same terminal nodes of two trees. The third one measures pairwise similarity via the proportion of split variables used in both trees. However, it only focuses on the proportion of split variables used in both trees, but not where they are used in the tree. Whether a variable was used as the first split variable in one tree and the last in the other is not taken into account. In addition, it ignores if a split variable is used more than once in the trees.</p><p>However, these two aspects are are assigned to in the further development of this measure in <ref type="bibr" target="#b10">[11]</ref>. Depending on the position of the split variable, its influence on the similarity is weighted, thus it is called weighted splitting variables (WSV). In <ref type="bibr" target="#b10">[11]</ref>, the three measures from <ref type="bibr" target="#b9">[10]</ref> and the WSV measure were used for the selection of an MRT and their performance was compared. The results showed that the structure of the RF can be best represented with the WSV measure, as the predictions on a validation data set were most similar to the ones from the RF.</p><p>In <ref type="bibr" target="#b11">[12]</ref>, MRTs are also selected, but the authors suggest that in some cases it is better to use more than one MRT as a representation for an RF. To obtain this small ensemble of representative trees, the pairwise distances of the decision trees are clustered using the partitioning around medoids (PAM) algorithm. To do this, it is necessary to specify in advance how many clusters and thus MRTs are to be found. In addition, the aspect of interpretability was not investigated and no analyses were performed to determine whether the prediction quality actually changes depending on the number of MRTs.</p><p>However, MRTs have a disadvantage. When creating the RF, not all variables from the training data set are available to the trees at each split <ref type="bibr" target="#b4">[5]</ref>. A random subset of the variables is drawn in each node, which means that potentially at some splits only noise variables are available for splitting. This can result in uninformative splits that do not improve the prediction quality of the RF but lead to deeper trees than necessary. Such uninformative splits can also occur in the selected MRTs. In addition, important variables are not necessarily used as top splits at the root. Decision trees and thus MRTs are easier for a human to interpret if they only consist of a few splits. To overcome this problem, artificial representative trees (ART) should be created.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Specific research questions, hypothesis and objectives</head><p>Resulting from the open topics from the previous section, my research will focus on the following four objectives: 1. To develop and evaluate an algorithm that generates an ART based on an existing RF.</p><p>Analogous to the MRT, a surrogate model is to be created which can be interpreted instead of the RF.</p><p>To evaluate if ARTs provide better interpretability compared to MRTs. The hypothesis is that ARTs are not as deep as MRTs, use a larger proportion of effect variables and a smaller proportion of null variables while having comparable prediction performance.</p><p>(WP 1 &amp; WP 2) 2. To compare ARTs with other surrogate models regarding to the same criteria as for the first objective, to assess if the higher effort for generating ARTs compared to classical surrogate models is worthwhile. I will also investigate which of the methods should be preferred for which type of data set. (WP 3) 3. To compare ensembles of ARTs and MRTs, based on the clustering approach from <ref type="bibr" target="#b11">[12]</ref> with regards to the same criteria as the previous objectives. (WP 4) 4. To investigate whether other tree algorithms than the classic CART from <ref type="bibr" target="#b12">[13]</ref> can improve prediction performance and interpretability of ARTs. Binary splits are often used in RF, as splits can be concatenated to any depth. However, trees with more than two splits in each layer are easier to understand than a deep concatenation of several splits. (WP 5)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Research approach, methods, and rationale for testing the research hypothesis</head><p>The following five work packages (WP) are defined to achieve the four objectives above.</p><p>WP 1 To create an ART, a new decision tree is grown iteratively using a greedy algorithm. First, all stumps that are possible with the available training data are created. The similarity to the RF is calculated for all stumps and the one with the greatest similarity to the RF is used. Analogous to MRT, various measures can be used to define similarity, such as the split variables used or the prediction errors. Then, in each additional iteration, all trees are created that are possible with exactly one more split. If one or more trees fulfill these criteria, the one with the greatest improvement in similarity or prediction is selected and a new iteration is started. If none of these trees improves the similarity or does not improve the prediction if the similarity remains the same, the algorithm stops. WP 2 To compare the use of a single ART with a single MRT, I will first perform simulation studies. The advantage of simulation studies is that the relationships between the predictor variables and with the target variable are fully known. For the performance comparison of an ART and an MRT, I will initially consider only regression problems using the same structure as in <ref type="bibr" target="#b10">[11]</ref>. In the first scenario, the data set consists exclusively of binary variables, with a small number of effect variables with large effects. The other scenarios represent variations of this, for example by using many effect variables with lower effect sizes, correlated variables, and interaction effects. The last scenario finally uses continuous variables. As quality measures, I will compare the prediction performance and consider the split variables used as well as the tree depth. The deviation of the predictive performance of the ARTs and MRTs from the RF is calculated using the MSE. In addition, it is measured how many of the splits use noise variables, which is called the false discovery rate (FDR).</p><p>It is also measured how many effect and noise variables are used as split variables, as well as the runtime. Afterward, ARTs and MRTs will be compared with a benchmark data set from OpenML (https://openml.org/). Analogous to the simulation study, the deviation of the MSE and the tree depth are measured. In addition, the R2 and the Akaike information criterion (AIC) are estimated. WP 3 I will perform extensive simulation studies analogous to WP 2 to compare ARTs with a decision tree as a surrogate model. Then the ARTs and decision trees will be applied to clinical or benchmark data sets. I will enlarge the simulation studies to cover more complex designs (e.g. classification problems and high-dimensional data) so that they are more similar to clinical use cases. For the performance comparison, I will additionally focus on the stability of the results by ARTs and decision trees. For this, I will compare the similarity of several ARTs and decision trees that were created on the same data. The various measures from <ref type="bibr" target="#b9">[10]</ref> and WSV from <ref type="bibr" target="#b10">[11]</ref> will be used as similarity measures so that the similarity is evaluated concerning various aspects. WP 4 For the comparison of ensembles of MRTs and ARTs based on a clustering of the trees of the RF, I will first extend the approach of <ref type="bibr" target="#b11">[12]</ref>. For example, I will integrate an automatic selection of the number of clusters using the improvement in prediction quality. As long as the prediction quality increases by adding a cluster and thus a representative tree, the number of clusters will be further increased. For the simulation study and benchmark data application, data sets containing latent subgroups will be used, as in these cases it is assumed that an ensemble of representative trees is more suitable than a single one. The remaining procedure for the simulation will be done in the same way as for WP 2. This will also increase the focus on predictive quality and interpretability as quality measures compared to <ref type="bibr" target="#b11">[12]</ref> for MRTs. WP 5 To obtain ARTs that do not only split binary, the ART algorithm from WP 1 should be extended. For example, splitting with the same variable several times in succession could be favored by a higher weighting. As soon as the split variable of a node is used a second time for splitting in the child node, the two nodes can be combined into a single node with more than two child nodes. I will again investigate the performance of this approach using simulations. In addition, I will vary the hyperparameter for weighting the repeated splits with the same variable to examine its influence on the quality criteria mentioned in WP 2. The tree depth and the number of terminal nodes are also compared.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and contributions to date</head><p>The aim of developing an ART algorithm was successfully realized, which is shown in algorithm 1. We integrated the implementation of the algorithm into the R package timbR (https://github. com/imbs-hl/timbR), which is based on trees built with the R package ranger <ref type="bibr" target="#b13">[14]</ref>. ARTs can now be used for global interpretation of the RF so that, for example, a physician can understand the model's predictions. ARTs also enable local interpretability, so that the individual decisions of the ART can be compared with known knowledge from the literature or can be discussed with a physician as is medically plausible. ARTs can also be used as a prediction model with the option of interpreting individual predictions.</p><p>I have performed the simulation study mentioned in WP 2 and the comparison of an ART and MRT using the benchmark data. The ART was superior in terms of interpretability and the use of fewer noise variables. In fig <ref type="figure" target="#fig_0">1</ref>, it can be seen that the predictions of the ARTs were more simular to the RF than those of the MRTs. In addition, ARTs used almost no noise variables. However, ARTs are somewhat more conservative in the use of effect variables than MRTs (results not shown, but displayed in <ref type="bibr" target="#b14">[15]</ref>).</p><p>The manuscript was accepted as a conference paper at the XAI-2024 conference under the title "Construction of artificial most representative trees by minimizing tree-based distance measures" <ref type="bibr" target="#b14">[15]</ref>. This study was funded by the Medical Section of the University of Lübeck (J01-2024 to BL).</p><p>For WP 3, I have compared ARTs and decision trees as surrogate models in a few simple structured simulated scenarios. The MSE of the predictions and the FDR of the ARTs were smaller than those of the decision trees. In addition, the ARTs again consist mainly of effect variables, whereas the decision trees use a higher proportion of noise variables. Nevertheless, the predictions of the decision trees were more similar to those of the RFs than ones of the ARTs. However, the process is not finished yet. For the final simulation study, I will extend the simulated scenarios to several different ones and will focus on the structure of more complex clinical data. For example, I will use gene expression data as a possible application example. In addition, I will investigate both classification and regression problems.</p><p>For the ensembles of representative trees in WP 4, we compared various clustering methods such as k-means or hierarchical clustering using the ward method with simulations using different numbers of MRTs. This was done as part of a master's thesis that I co-supervised. The most stable results in terms of prediction quality were provided by k-means. In addition, we used various similarity measures, of which WSV from <ref type="bibr" target="#b10">[11]</ref> provided the best prediction quality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Expected next steps and final contribution to knowledge</head><p>The comparison between single ARTs and MRTs has been completed. ARTs were found to be a better alternative to MRTs, as their results are better and easier to interpret. The code for the  simulation as well as for the ART algorithm is freely available to enable other scientists to use it easily (https://github.com/imbs-hl/ART_paper; https://github.com/imbs-hl/timbR).</p><p>Next, I will extend the performance and interpretability comparison of ARTs with decision trees as surrogate models to identify the advantages and disadvantages of both methods in different scenarios.</p><p>Additionally, I will integrate the use of ensembles of ARTs into the R package timbR and carry out the planned comparison with MRTs. We assume that prediction performance and interpretability can be further improved through the use of ARTs.</p><p>Furthermore, I will further improve the performance by extending the CART based ART algorithm so that a very easy-to-interpret model is available for a wide variety of data structures, which also has a good prediction quality.</p><p>Finally, I will apply ARTs in ongoing collaborative research projects in Neuro-and Cardiogenetics to provide interpretable models for the clinical context. For example, to use an ART to investigate the influence and interaction of genetic variants in prediction modeling of age at onset in X-linked dystonia-parkinsonism.</p><p>In summary, the use of ARTs offers promising opportunities to develop interpretable models for the clinical context, and further research will lead to a surrogate model that is easy to use and interpret.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Comparison of the performance of ARTs (orange) and MRTs (blue) for all five simulation scenarios in 100 repetitions. The deviation from the prediction performance of the RF was measured using the MSE. The fractions of covered noise variables were calculated by the number of noisevariables that occur at least once in the surrogate model divided by the number of all simulated noise variables.</figDesc><graphic coords="6,90.91,288.57,204.24,102.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Algorithm 1</head><label>1</label><figDesc>Generate ART Require: random forest RF, similarity metric metric Extract all split_points from RF Reduce split_points using only important variables Build all possible stumps using split_points Estimate similarities of all stumps to RF using metric Select stump with maximum similarity → ART_candidate repeat ART ← ART_candidate Build all possible trees with one additional split using split_points Estimate similarities of all new trees to RF using metric Select new tree with maximum similarity → ART_candidate until similarity(ART_candidate) &lt; similarity(ART ) return ART</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Thanks to Prof. Silke Szymczak and Dr. Björn-Hergen Laabs for their supervision and support with my doctoral thesis.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Why do tree-based models still outperform deep learning on tabular data?</title>
		<author>
			<persName><forename type="first">L</forename><surname>Grinsztajn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Oyallon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2207.08815</idno>
		<idno type="arXiv">arXiv:2207.08815</idno>
		<ptr target="http://arxiv.org/abs/2207.08815.doi:10.48550/arXiv.2207.08815" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Random forests for genomic data analysis</title>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ishwaran</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ygeno.2012.04.003</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S0888754312000626.doi:10.1016/j.ygeno.2012.04.003" />
	</analytic>
	<monogr>
		<title level="j">Genomics</title>
		<imprint>
			<biblScope unit="volume">99</biblScope>
			<biblScope unit="page" from="323" to="329" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A practical introduction to Random Forest for genetic association studies in ecology and evolution</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S O</forename><surname>Brieuc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Waters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Drinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Naish</surname></persName>
		</author>
		<idno type="DOI">10.1111/1755-0998.12773</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/pdf/10.1111/1755-0998.12773" />
	</analytic>
	<monogr>
		<title level="j">Molecular Ecology Resources</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="755" to="766" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Risk controlled decision trees and random forests for precision Medicine</title>
		<author>
			<persName><forename type="first">K</forename><surname>Doubleday</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Fu</surname></persName>
		</author>
		<idno type="DOI">10.1002/sim.9253</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9253" />
	</analytic>
	<monogr>
		<title level="j">Statistics in Medicine</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="719" to="735" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Random Forests</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<idno type="DOI">10.1023/A:1010933404324</idno>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="5" to="32" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Variable selection -A review and recommendations for the practicing statistician</title>
		<author>
			<persName><forename type="first">G</forename><surname>Heinze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wallisch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dunkler</surname></persName>
		</author>
		<idno type="DOI">10.1002/bimj.201700067</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.201700067" />
	</analytic>
	<monogr>
		<title level="j">Biometrical Journal</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="page" from="431" to="449" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Interpretable Machine Learning: A Guide for Making Black Box Models Explainable</title>
		<author>
			<persName><forename type="first">C</forename><surname>Molnar</surname></persName>
		</author>
		<ptr target="https://christophm.github.io/interpretable-ml-book" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>2 ed</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">One tree to explain them all</title>
		<author>
			<persName><forename type="first">U</forename><surname>Johansson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sönströd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Löfström</surname></persName>
		</author>
		<idno type="DOI">10.1109/CEC.2011.5949785</idno>
		<ptr target="https://ieeexplore.ieee.org/abstract/document/5949785.doi:10.1109/CEC.2011.5949785" />
	</analytic>
	<monogr>
		<title level="m">IEEE Congress of Evolutionary Computation (CEC)</title>
				<imprint>
			<date type="published" when="1941">2011. 2011. 1941-0026</date>
			<biblScope unit="page" from="1444" to="1451" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Why did AI get this one wrong? -Tree-based explanations of machine learning model predictions</title>
		<author>
			<persName><forename type="first">E</forename><surname>Parimbelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Buonocore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nicora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Michalowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wilk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bellazzi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.artmed.2022.102471</idno>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence in Medicine</title>
		<imprint>
			<biblScope unit="volume">135</biblScope>
			<biblScope unit="page">102471</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Identifying representative trees from ensembles</title>
		<author>
			<persName><forename type="first">M</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-M</forename><surname>Noone</surname></persName>
		</author>
		<idno type="DOI">10.1002/sim.4492</idno>
	</analytic>
	<monogr>
		<title level="j">Statistics in medicine</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="1601" to="1616" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Identification of representative trees in random forests based on a new tree-based distance measure</title>
		<author>
			<persName><forename type="first">B.-H</forename><surname>Laabs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Westenberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">R</forename><surname>König</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11634-023-00537-7</idno>
		<ptr target="https://doi.org/10.1007/s11634-023-00537-7.doi:10.1007/s11634-023-00537-7" />
	</analytic>
	<monogr>
		<title level="m">Advances in Data Analysis and Classification</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">C443: a Methodology to See a Forest for the Trees</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sies</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Van Mechelen</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00357-019-09350-4</idno>
		<ptr target="https://doi.org/10.1007/s00357-019-09350-4.doi:10.1007/s00357-019-09350-4" />
	</analytic>
	<monogr>
		<title level="j">Journal of Classification</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="730" to="753" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Classification and Regression Trees</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Olshen</surname></persName>
		</author>
		<ptr target="https://books.google.de/books?id=JwQx-WOmSyQC" />
		<imprint>
			<date type="published" when="1984">1984</date>
			<publisher>Taylor &amp; Francis</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ziegler</surname></persName>
		</author>
		<idno type="DOI">10.18637/jss.v077.i01</idno>
		<ptr target="https://www.jstatsoft.org/index.php/jss/article/view/v077i01.doi:10.18637/jss.v077.i01" />
	</analytic>
	<monogr>
		<title level="j">Journal of Statistical Software</title>
		<imprint>
			<biblScope unit="volume">77</biblScope>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Construction of Artificial Most Representative Trees by Minimizing Tree-Based Distance Measures</title>
		<author>
			<persName><forename type="first">B.-H</forename><surname>Laabs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">L</forename><surname>Kronziel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">R</forename><surname>König</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Szymczak</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-63797-1_15</idno>
	</analytic>
	<monogr>
		<title level="m">Explainable Artificial Intelligence</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Longo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Seifert</surname></persName>
		</editor>
		<meeting><address><addrLine>Nature Switzerland; Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="290" to="310" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
