<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Explaining Machine Learning DGA Detectors from DNS Traffic Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Giorgio</forename><surname>Piras</surname></persName>
							<email>giorgio.piras@unica.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maura</forename><surname>Pintor</surname></persName>
							<email>maura.pintor@unica.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Pluribus One S.r.l</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luca</forename><surname>Demetrio</surname></persName>
							<email>luca.demetrio93@unica.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Pluribus One S.r.l</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Battista</forename><surname>Biggio</surname></persName>
							<email>battista.biggio@unica.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Pluribus One S.r.l</orgName>
								<address>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Explaining Machine Learning DGA Detectors from DNS Traffic Data</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CC19B641A6999681424B8E72F6305A00</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Machine Learning, Explainability, Cybersecurity, DNS, Network Security, Monitoring and Detection Orcid 0000-0001-8225-6138 (G. Piras)</term>
					<term>0000-0002-1944-2875 (M. Pintor)</term>
					<term>0000-0001-5104-1476 (L. Demetrio)</term>
					<term>0000-0001-7752-509X (B. Biggio)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>One of the most common causes of lack of continuity of online systems stems from a widely popular Cyber Attack known as Distributed Denial of Service (DDoS), in which a network of infected devices (botnet) gets exploited to flood the computational capacity of services through the commands of an attacker. This attack is made by leveraging the Domain Name System (DNS) technology through Domain Generation Algorithms (DGAs), a stealthy connection strategy that yet leaves suspicious data patterns. To detect such threats, advances in their analysis have been made. For the majority, they found Machine Learning (ML) as a solution, which can be highly effective in analyzing and classifying massive amounts of data. Although strongly performing, ML models have a certain degree of obscurity in their decisionmaking process. To cope with this problem, a branch of ML known as Explainable ML tries to break down the black-box nature of classifiers and make them interpretable and human-readable. This work addresses the problem of Explainable ML in the context of botnet and DGA detection, which at the best of our knowledge, is the first to concretely break down the decisions of ML classifiers when devised for botnet/DGA detection, therefore providing global and local explanations.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>During the last decades, our day-by-day life has been strictly connected to the usage of devices and online services, therefore making their efficiency and continuity play a crucial role in the technological transformation we witness. Likewise, the economic loss derived from cyberthreats has increased exponentially in recent years <ref type="bibr" target="#b0">[1]</ref> as the technologies continually evolve and attackers develop their skills. One of the most common ways cybercriminals try to jeopardize the continuity of systems and thus cause economic damage is Denial of Service (DoS), which aims to drain the computing capabilities of the target system in both fancy and basic ways. A case of this attack is the Distributed Denial of Service DDoS, where a network of infected devices (bots) are commanded by an attacker (botmaster) through a Command&amp;Control Server (C&amp;C) <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. What happens to be erratic and thus detectable by a Machine Learning (ML) model in this kind of attack is the DNS traffic, carrying Domain Names through which bots are connected to the C&amp;C server. This stealthy connection strategy is commonly known as Domain Fluxing, where the algorithms used by the infected bots to generate the domain are known as Domain Generation Algorithms (DGAs).</p><p>Although employing ML models to detect the presence of botnets within network traffic has been demonstrated to be successful, almost the entirety of the relevant works have followed a common baseline and workflow, presenting a partially novel feature set on which to train a classifier to obtain relevant results <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>. The proposed approaches lack interpretability and contextualization. First, depending on the context from which DNS traffic data is extracted and the model is deployed, potential attackers might have control over some features. Second, the model prioritization and general usage of the features in the decision process are not known beforehand, making the process challenging to debug and protect.</p><p>To make up for these problems, we first analyze the techniques used to detect botnets/DGAs from the DNS data (Section 2); we analyze which explainability techniques can provide insight into how the model takes its decisions (Section 3). Upon a re-implementation of the EXPOSURE system <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b4">5]</ref> (Section 4), we provide the following contributions: (i) we build and test the EXPOSURE system on a newly collected dataset; (ii) we observe statistics on the features used by the system; (iii) we train different classifiers and compare their performances; (iv) we obtain explanations from such classifiers; and (v) given the explanations, we develop and discuss an analysis on the features used by the systems mentioned above. Finally, we conclude the work by presenting related works (Section 5), limitations, and future directions (Section 6).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background: DNS System and ML Techniques</head><p>From DNS to DGA. The Domain Name System (DNS) is a database responsible for mapping domain names to IP addresses, thus answering a query made by clients in the form of a domain name towards the IP addresses. This action is commonly known as resolution <ref type="bibr" target="#b9">[10]</ref>. The DNS organizes domain names into a hierarchy (through dot-separated levels), as the whole technology itself creates a hierarchical database structure. The information stored and carried by DNS records can be A records, returning IPv4 addresses, NS records, returning authoritative name servers, and finally, PTR, which stands for Pointer to Record and returns a domain name but in the reverse query format (i.e., the question started from an IP rather than a domain name). It is also worth citing specific information carried by the DNS packets, such as the Time-to-live (TTL), which indicates how long the server will cache that packet <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b9">10]</ref>. Being of paramount importance for the correct functioning of basic internet activities, the DNS is the perfect target for malicious activities having a high impact on unaware users. That is why this technology gets exploited by attackers (botmasters) who aim to command and control a network of infected machines, i.e., a botnet. To go as undercover as possible, having only one domain name to which to connect would have the botnet quickly taken down by vigilant authorities. That is why bots generate massive DNS traffic trying to connect to a much more concealed C&amp;C server. The generation of such a significant amount of domain names happens through Domain Generation Algorithms that, given a random seed, create a string that will possibly establish a connection.</p><p>Botnet Detection with ML: The EXPOSURE system. Seizing the chance to detect malicious patterns, the research community has driven its efforts towards analyzing the DNS data, ex- tracting the features, and eventually training a ML model capable to distinguish malicious and benign DNS behaviors. The EXPOSURE system <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b4">5]</ref> is among the most prominent works for its completeness in the feature set and reproducibility (in terms of feature extraction). For this reason, we use it as a base for our explainability analysis. Table <ref type="table" target="#tab_0">1</ref> shows the feature set, listing the features extracted by the EXPOSURE system (whose extensive description can be found in the original work <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b4">5]</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Explaining Predictions of ML-based DNS Analysis</head><p>As pointed out by Miller et al. <ref type="bibr" target="#b11">[12]</ref>, explanations increase transparency and interpretability so that user awareness and systems designers can jointly benefit from this gain of trust. In security-relevant scenarios, like the one we are considering, understanding the data and the model provides the added benefit of helping to see if there are problems in the system, for example assigning high relevance to spurious features that should not influence it to that extent <ref type="bibr" target="#b12">[13]</ref>.</p><p>Analyzing the dataset's statistics provides further insights into the separability of the features into the two different classes. Additionally, in the case under investigation, lots of features come from similar sources and elaborations, which lets the statistical analysis come in handy to highlight correlations and redundancies.</p><p>On top of that, we will use a ML model to analyze such features and categorize the samples into the two output classes of benign and malicious domains. Model explanations can help understand how the model is making such decisions. An explanation is said to be local if it is made on single samples and wants to describe how a model emphasizes the features of a specific single sample in its classification. On the other hand, global explanations are made over entire datasets or relevant collections of samples to describe how the model prioritizes features over those samples <ref type="bibr" target="#b13">[14]</ref>. This work will focus on both local and global explanations.</p><p>In <ref type="bibr" target="#b14">[15]</ref>, Lundberg and Lee proposed SHAP (SHapley Additive exPlanations), where feature importance is computed with an additive approach, representing a unified measure of feature importance. The basic concept behind SHAP comes from Shapley values and a game theory setting, where the features act as players and cooperate in a coalitional game (i.e., the prediction task) to receive a profit (i.e., a gain, which is the actual prediction). The Shapley values assign payouts to players depending on their contribution to the total payout <ref type="bibr" target="#b15">[16]</ref>. Thus each feature that contributes to the prediction task is computed as a sum of the expected marginal contributions in any feature value combination. Given the computational burden for which SHAP should find all the possible feature combinations, Lundberg and Lee proposed a Shapley kernel that produces estimates instead of exact values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Analysis and Explainability Techniques</head><p>This section will briefly explain the details and differences between the data analysis and explainability tools that will play a central role in the experiments section.</p><p>Feature Statistical Analysis. These plots show the marginal distributions of every pair of features as density plots, describing how the distributions for the classes behave. Through the scattered plots instead, we can assess where both benign and malicious samples lie in their ad-hoc feature space, thus making us capable of understanding to which extent pairs of features separate the data. Analyzing the scattered plots allows observing the distribution of the features to get a rough idea of how they will behave/discriminate and to which extent. Partial Dependence Plot. This plot shows the marginal effect that a single feature has on the prediction made by the model, thus providing global explanations. Taking as input the model, the feature, and a background distribution on which to make the model learn the feature importance, the Partial Dependence Plots (PDPs) depict the feature values on the x-axis, whilst the y-axis represents the expected prediction contribution given the feature value. In the background, a histogram shows the underlying data distribution of the feature values. A horizontal line represents the expected contribution to the prediction, and a vertical one represents the expected value of the feature. By reading this plot, we can measure how the observed feature contributes to the classification of the samples. Summary Plot. The SHAP summary plot, which as the PDP is a global explanation technique, shows how the model prioritizes the features and how these contribute to steering the classification towards each class. This plot comprises a list of features ordered from the one giving the higher contribution to the least powerful as interpreted by the model, showing the magnitude for benign (in blue) and malicious (in red) samples.</p><p>Force Plot. This technique is one of the local explainability methods provided in SHAP. It explains why a specific sample has been assigned a particular label. This can be useful for understanding why samples are misclassified and to which extent the classifier misunderstands them. Force plots, showing the magnitude of the feature contribution on single samples, are rendered as blue arrows indicating magnitude values towards the benign class and red vice-versa.</p><p>In the next section, we will use the presented techniques to explain predictions of our reimplementation of the EXPOSURE system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>In the experimental section of this work, we will first describe our re-implementation of the EXPOSURE system. Then we will discuss the DNS traffic data we used to make our feature extraction, followed by a brief model selection made to improve the system's performance. Eventually, we will delve into the results section to show how explanations applied in this context can bring the analysis to the next level.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Re-implementation of the EXPOSURE system</head><p>Dataset. The DNS traffic was collected from recursive servers on which, through sniffers, we were able to save the data as .pcap files for the entire month of January 2021. Given the massive amount of traffic, summing up to 15 GB of data per day, we filtered out packets whose label was not known by either black or white lists and domain names that did not resolve (NXDOMAIN as response code). We used the list of most popular suffixes from the Alexa website<ref type="foot" target="#foot_0">1</ref> to label benign domains, and the list from DGArchive <ref type="bibr" target="#b16">[17]</ref> to flag the malicious samples. The remaining packets (203, 034 domains, of which 25, 882 benign and 177, 152 malicious -note that benign domains re-appear much more frequently than the malicious ones, which are almost always unique) have then been passed through the feature extractor we implemented, and are distributed through days as shown in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>Model Selection. The authors used a J48 Decision Tree to obtain overall good performances in the original EXPOSURE work. We additionally bench-marked several models such as Decision Tree (DT), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Ada-Boost (ADA), and Random Forest (RF). After estimating the hyperparameters through a Grid Search, (whose bests overall have been reported in Section 7) we compared the best models with the best parameters on two different days (i.e., two different sample balances). The first ROC curve in Figure <ref type="figure" target="#fig_2">2a</ref> was obtained using a more balanced day of data (mid of January days). The second set of curves in Figure <ref type="figure" target="#fig_2">2b</ref> was obtained from a day of data with very few malicious samples, showing how the performances of the classifiers dropped down consistently. Overall, throughout the  most balanced days, RF and ADA have shown to be the more consistent classifiers. For this reason, they have been selected as classifiers for the rest of the experiments. We reckon that the mid-days of capture are also more suitable for the rest of the analysis, and they have been thereby used for all of the following experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Feature and Explainability Analysis on EXPOSURE</head><p>We now present our results on the analysis of the feature statistics and our insight obtained through applying explainability techniques to interpret the decisions taken by the machinelearning models used in our DGA detector. These proposed plots have been implemented through the Python libraries Seaborn <ref type="bibr" target="#b17">[18]</ref> and SHAP <ref type="bibr" target="#b18">[19]</ref>. Statistical Analysis. The statistical analysis shows an overview of the correlation and distribution of the features. As shown in Figure <ref type="figure" target="#fig_4">3a</ref>, some features like the %of_lms and num_chars% when joined, do not separate perfectly the data collection used into the two classes. In particular, the %of_lms reaches a plateau in malicious domains once over 0.8 (bottom-right plot, depicting the distribution of the feature), which describes how algorithmically-generated domains tend not to have a single meaningful word covering their entire name in most of the cases, yet there are exceptions in any direction. This might be brought on by the diversity in malware families, where some like "Gameover" DGA used to mix up numbers and characters. In contrast, others such as "Gozi" used to mix up words from openly accessible documents, such as the US constitution <ref type="bibr" target="#b19">[20]</ref>. In the plot of Figure <ref type="figure" target="#fig_4">3b</ref>, depicting time-based features, malicious domains show a more volatile behavior, which is reasonable if we think about the diversity of applications in which they can be used. In Figure <ref type="figure" target="#fig_13">8</ref>, shown in the Appendix, we can observe interesting TTL behaviors characterizing the domains. Contrary to the now old-fashioned belief that a low TTL is only typical for malicious domains <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref>, as it makes malicious records stand less in caches, we show that also benign domains can present this behavior depending on the application in which they are used, e.g., to handle critical resources <ref type="bibr" target="#b22">[23]</ref> or for load balancing purposes <ref type="bibr" target="#b21">[22]</ref>.  Interpreting Global Explanations. As pointed out in Section 3, different models can use features in different ways. Global explanations can uncover these behaviors and let the analyst be aware of the feature prioritization that a model brings. Some Decision-Tree based classifiers such as ADA and RF, respectively in the summary plots of Figure <ref type="figure" target="#fig_6">4a</ref> and Figure <ref type="figure" target="#fig_6">4b</ref>, share four out of the five top important features, which is likely to be a consequence of their "similar" tree-based intrinsic nature. In both of them, unique_ips notably brings the higher contribution. In Section 7, we show how other classifiers have a low magnitude provided by the unique_ips feature, whilst prioritizing a diverse subset of TTL features. This furtherly shows how it is not possible to solely rely on statistical analysis to foresee the utilization of the features, as class separations that at first glance look either weak or strong can be subverted.  Partial Dependence Plots show the marginal effect of a single feature globally on the predictions. Considering a trained classifier (RF in the case of this analysis) and a background distribution, through SHAP we can assess how the considered feature contributes to classifying the background samples over their values. An advisable security-related use of these plots can be to employ a background distribution of malicious samples, thus analyzing to which extent the feature values contribute to classifying the sample as malicious. The following plots (after normalization) have been made using a background distribution of 1000 malicious samples on the RF classifier. Figure <ref type="figure" target="#fig_8">5a</ref> shows the PDP of the strongest feature of the model. The plot tends to be a "gentle" step, producing the highest contribution on very low feature values and the lowest with values going just subtly over the threshold. Very similar behavior to the one of the unique_ips feature is shown in the number of changes in the TTL, depicted in Figure <ref type="figure" target="#fig_12">7</ref> in Section 7. These plots help, for example, understanding the extent to which features contribute to the classification of the domain as malicious and possibly setting policies and restrictions based on simply tweakable features, such as the num_chars% in Figure <ref type="figure" target="#fig_8">5b</ref>. In this plot, we can understand how a high rate of numerical characters leads to a solid contribution to the prediction of the domain as malicious. Likewise, it is surprising that a 20% rate of numerical characters in the domain string leads to an even bigger magnitude, which can be caused by the relevant presence of some malware families not having numbers in their "regex". In this case, usage of the proposed security policies for a system hosting an EXPOSURE-like system would be to allow domain names with numerical characters comprised in between the 20%-40% range.  Summary of the Results. As a result of the presented experiments, we can reflect on the issues of feature management and hypothetical counteractions. We understand the features distribution, correlation, and how the DGAs in our traffic tend to behave from statistical analysis. However, the global explanations can turn the table and quantify how the model perceives the features. Finally, we can see how features drive the sample's prediction via local explanations. This ensemble of analysis makes us notice how the overall feature prioritization depends on both the model used and the considered data, which further proves how context-dependent such systems' behavior can be. Hence, an Explainability analysis should always be used to better portray the big picture of both systems and employed data. In our case, the big picture  has led to a more prominent analysis of the EXPOSURE system. The TTL features, the subject of this analysis, sum up to 37.5% of the entire feature set, being 9 out of 24. It turns out from our analysis of the explanations that they contribute massively to the misclassification of several samples (such as Figure <ref type="figure" target="#fig_10">6b</ref> and Figure <ref type="figure" target="#fig_10">6c</ref>) as they cover high-magnitude roles in the summary plots of Figure <ref type="figure" target="#fig_6">4a</ref> and Figure <ref type="figure" target="#fig_6">4b</ref> (which are the best classifiers overall). This makes this feature set highly powerful for the whole system, yet in its power also lies a crucial problem. Namely, attackers can manipulate this feature, being completely free to tune the TTL and balance their caching time (i.e., the likelihood of being detected) with the chance of evading the classification of the system. Having such a relevant portion of the feature set reserved for values that can be somehow crafted directly by the attackers, it can serve as a significant stepping-stone for attackers. Furthermore, if deployed on a system devised to manage critical resources, besides evading, some part of the feature set can be overridden by the context. Some works like <ref type="bibr" target="#b22">[23]</ref> point out how security-sensitive systems, e.g., banking applications, should indeed carefully set their DNS TTL to a low value. The scope of these assumptions is that a botnet/DGA detector cannot solely rely on accuracy metrics to establish its efficiency, in that analysts also need to be aware of the model, the data, and the context therein. Explanations can give a huge and crucial hand in this regard, helping prevent major issues from happening and allowing debugging of the model. Considering our system, through explanations, we have seen how dangerously influential TTL-based features are in most of the models. And considering their extensive use in the feature set, appropriate security measures should be taken (e.g., reducing their number like for Name-Based features, which are just easily adjustable as well but sum up to only the 8%). We firmly believe that through explanations we can rapidly enhance the usage and trust in AI, as companies can look at such security systems from a human-readable perspective, and model biases can be analyzed and studied.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Related Work</head><p>DNS Analysis. Several promising works have striven to tackle botnet/DGA detection during the last decade, often showing innovative DNS passive features and methodologies. Some notable works, besides EXPOSURE, have been Notos <ref type="bibr" target="#b23">[24]</ref>, where Antonakakis et al. created the first relevant and efficient reputation system for domains from various data sources. Pleiades <ref type="bibr" target="#b24">[25]</ref>,</p><p>where the authors focused on NXDOMAIN records to both cluster domains and classify DGAs by looking at the strings association. Finally, in FANCI <ref type="bibr" target="#b5">[6]</ref>  <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b29">30]</ref>. Considering that SHAP is a more reliable tool, we have driven our choice towards its use in our explainability work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions and Future Work</head><p>In this work, we proposed an explanatory analysis of ML classifiers devised for botnet/DGA detection. Starting from the implementation of the EXPOSURE feature set on our traffic data, we have shown how from prior statistical assumptions on the malware behavior within the network, a model can interpret features in its way globally, thus prioritizing certain features rather than others that were prevented, also demonstrating how different models can have a different feature conception, to which eventually we analysts should adapt and debug accordingly. Locally, we have seen how certain features can contribute and how explanations can make the analysts and users aware of the single decisions and motivations behind misclassified samples. Through these analyses, we raised concerns about how the feature and model can be biased by the context in which the systems are both trained and deployed. And our analysis makes the comprehension of such contexts move fast forward towards favoring the employment of such systems, as they can be firstly interpreted and adapted and subsequently accepted. In this regard, several advances of this work can be developed aiming at fairness and legal regularization of the detectors through explanations and, if possible, bringing them into debugging/pipelining processes to obtain an efficient and explainable system. Additionally, they can be instrumental when humans want to be involved in the decision-making. All in all, this work demonstrated how powerful explanations can be and how security, debugging, interpretability, and fairness can be brought to the next level by the application of ML to detection, where security has to be assessed and interpreted through the process chain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Appendix</head><p>In this additional section, we show several plots that have been cited in the previous sections and that we believe can support the comprehension of the work.</p><p>Grid Search Using the Scikit-Learn Python suite, we optimized the parameters through the GridSearchCV API. The results of the optimization have been reported for completeness in Listing 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TTL Features plots.</head><p>Having focused the discussion of the explainability analysis almost entirely on the TTL features, there are some additional plots that can point out interesting behaviors, such as Figure <ref type="figure" target="#fig_13">8</ref>, which shows the statistical analysis of the first 4 TTL-based features. In Figure <ref type="figure" target="#fig_12">7</ref> instead, we can see how low changes in the TTL values mean a low contribution to the classification of the sample as malicious and vice versa.</p><p>Additional Summary Plots. Figure <ref type="figure" target="#fig_14">9</ref> shows how unique_ips are much less considered than the TTL-based features by the KNN classifier, which again shows how models are as diverse as they are. The same goes for the SVC classifier in Figure <ref type="figure" target="#fig_15">10</ref>, which once again does not employ the unique_ips feature as much as the Decision-Tree based classifiers do.</p><p>Additional Force Plots. The plots of Figure <ref type="figure" target="#fig_16">11</ref> show a variety of samples either correctly classified or misclassified by the RF model, demonstrating practically how the most relevant features can play a major role in any classification scenario, either in the wrong or correct way.      </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Malicious and legitimate domains for every day.</figDesc><graphic coords="6,89.29,84.19,416.68,104.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>(a) Classifiers trained on day 14, with class distribution unbalanced but less evident than the other days in the dataset.(b) The same classifiers trained on day 5, which presents a highly-unbalanced distribution of the two classes.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The ROC curves obtained using features from day 14 (left) and on day 5 (right).</figDesc><graphic coords="6,100.17,328.87,187.50,187.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>(a) Correlation between name-based features. (b) Correlation between time-based features.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Density and scatter plots for two pairs of name-based (left) and time-based features (right). The plots are based on the 12 th day of capture, which counts 1324 malicious samples and 1324 benign samples (resized from 6925).</figDesc><graphic coords="7,100.17,389.62,187.51,172.02" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>(a) SHAP summary plot of feature contributions on ADA BOOST classifier. (b) SHAP summary plot of feature contributions on RANDOM FOREST classifier.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Global summary plots for ADA (left) and RF (right) classifiers.</figDesc><graphic coords="8,307.60,191.00,197.92,213.42" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head></head><label></label><figDesc>(a) SHAP PDP on unique_ips feature.(b) SHAP PDP on num_chars% feature.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Partial Dependence plots for unique_ips (left) and num_chars% (right)</figDesc><graphic coords="9,101.41,163.90,187.51,140.63" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head></head><label></label><figDesc>(a) Force plot of the benign domain sample spring.io, correctly classified as benign. (b) Force plot of the malicious domain sample fgc.se, misclassified as benign.(c) Force plot of the benign domain sample topeleven.com, misclassified as malicious.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Local explanations on three domains from the dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Listing 1 : 1 # RANDOM FOREST parameters 2 { 3 # ADA-BOOST parameters 4 { 5 # K-NEAREST NEIGHBORS parameters 6 { 7 # DECISION TREE parameters 8 { 9 # SVC RBF parameters 10 {</head><label>112345678910</label><figDesc>Grid Search Results 'criterion': 'entropy', 'max_depth': 20, 'n_estimators': 125} 'algorithm': 'SAMME', 'n_estimators': 175} 'n_neighbors': 13, 'weights': 'distance'} 'criterion': 'gini', 'max_depth': 5} 'C': 297.63514416313194, 'gamma': 0.6951927961775591}</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: SHAP PDP on 𝑡𝑡𝑙_𝑐ℎ𝑎𝑛𝑔𝑒𝑠 feature.</figDesc><graphic coords="16,193.47,289.18,208.35,156.26" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_13"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Correlation between the first four TTL-based features.</figDesc><graphic coords="17,89.29,167.96,416.66,398.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_14"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: SHAP summary plot of feature contributions on KNN classifier.</figDesc><graphic coords="18,151.80,210.29,291.68,314.05" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_15"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: SHAP summary plot of feature contributions on SVC classifier.</figDesc><graphic coords="19,151.80,93.57,291.68,314.05" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_16"><head>Figure 11 :</head><label>11</label><figDesc>Figure 11: Local explanations on other three domains from the dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>List of features used in EXPOSURE renewed with a mnemonic. The first column indicates the Feature subset. The second one shows the number of features that a specific feature holds. Finally, the third and fourth columns indicate respectively the feature name chosen by the authors and our feature names for atomic features.</figDesc><table><row><cell>Feature set</cell><cell>#</cell><cell>Paper Feature Name</cell><cell>Our Feature Names</cell></row><row><cell></cell><cell>2</cell><cell>Short Life</cell><cell>glob_short_lived</cell></row><row><cell>Time-Based Features</cell><cell>1</cell><cell>Daily similarity</cell><cell>glob_life_ratio daily_similarity</cell></row><row><cell></cell><cell>2</cell><cell>Repeating patterns</cell><cell>local_numOf_changes</cell></row><row><cell></cell><cell></cell><cell></cell><cell>stddev_before_change</cell></row><row><cell></cell><cell>2</cell><cell>Access Ratio</cell><cell>idle</cell></row><row><cell></cell><cell></cell><cell></cell><cell>popular</cell></row><row><cell></cell><cell>1</cell><cell>Number of distinct IP addresses</cell><cell>unique_ips</cell></row><row><cell>DNS Answer-Based Features</cell><cell>1 3</cell><cell>Number of distinct countries Reverse DNS query result</cell><cell>unique_ccode rev_arec</cell></row><row><cell></cell><cell></cell><cell></cell><cell>rev_nsrec</cell></row><row><cell></cell><cell></cell><cell></cell><cell>rev_asnrec</cell></row><row><cell></cell><cell>1</cell><cell>Number of domains sharing the</cell><cell>shared_ips</cell></row><row><cell></cell><cell></cell><cell>same IP</cell><cell></cell></row><row><cell></cell><cell>1</cell><cell>Average TTL</cell><cell>ttl_avg</cell></row><row><cell></cell><cell>1</cell><cell>Standard Deviation of TTL</cell><cell>ttl_stddev</cell></row><row><cell>TTL Value-Based Features</cell><cell>1</cell><cell>Number of distinct TTL values</cell><cell>unique_ttls</cell></row><row><cell></cell><cell>1</cell><cell>Number of TTL changes</cell><cell>ttl_changes</cell></row><row><cell></cell><cell>5</cell><cell>Percentage usage of TTL ranges</cell><cell>ttl_range1</cell></row><row><cell></cell><cell></cell><cell></cell><cell>ttl_range100</cell></row><row><cell></cell><cell></cell><cell></cell><cell>ttl_range300</cell></row><row><cell></cell><cell></cell><cell></cell><cell>ttl_range900</cell></row><row><cell></cell><cell></cell><cell></cell><cell>ttl_rangeinf</cell></row><row><cell>Domain Name-Based Features</cell><cell>1 1</cell><cell>% of numerical characters % of length of the LMS</cell><cell>num_chars% %of_lms</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>, Schueppen et al. developed a detector based on a small feature set such as the EXPOSURE one, though focusing only on NXDOMAIN passive data. All of these works have reached comparable performances in different settings. None of them, though, has focused their interest on the explainability of such a critical application. The only works to have addressed such problems focused on multiclass classification problems with deep learning approaches, thus classifying the malicious domains with family pairing. In<ref type="bibr" target="#b25">[26]</ref>, Becker et al. proposed a visual analytics system for Deep Learning (DL) models, providing graphical insights on statistical properties of the domain name string. Drichel et al., in<ref type="bibr" target="#b26">[27]</ref>, briefly highlighted some string-wise interpretations for DL models starting from the misclassified samples. In contrast, in<ref type="bibr" target="#b27">[28]</ref>, Drichel et al. proposed feature-based classifiers based on string features for multiclass classification, with the purpose of improving explainability. Firstly, none of these three works focused on passive DNS data, choosing string-based features to ease the computational burden. Secondly, none developed explanatory analysis, rather focusing on how the model could be made more explainable or at most on how to visualize a few string patterns from a DL model. Our work focuses on passive DNS traffic data, analyzing features from a comprehensive viewpoint and not limiting them to the human-readable string features. Additionally, we propose both local and global explanations, concretely enhancing the awareness of how a model behaves in such a context.</figDesc><table><row><cell>Explainability Techniques. In [29], Ribeiro et al. proposed LIME (Local Interpretable Model-</cell></row><row><cell>Agnostic Explanations), an explainability method conceived as a local model learning and</cell></row><row><cell>approximating around the prediction. Despite its wide use, several concerns about stability</cell></row><row><cell>and consistency have been addressed towards LIME</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.alexa.com/topsites</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partly supported by the PRIN 2017 project RexLearn, funded by the Italian Ministry of Education, University and Research (grant no. 2017TWNMH2); and by the project TESTABLE (grant no. 101019206), under the EU's H2020 research and innovation programme.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Firm</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Lostri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<ptr target="https://books.google.it/books?id=mG0jzgEACAAJ" />
	</analytic>
	<monogr>
		<title level="m">The Hidden Costs of Cybercrime</title>
				<imprint>
			<publisher>McAfee</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Botnet: An Overview</title>
		<author>
			<persName><forename type="first">R</forename><surname>Puri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Bots &amp;</title>
				<imprint>
			<publisher>Elsevier</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page">17</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Know Your Enemy: Fast-Flux Service Networks</title>
		<author>
			<persName><forename type="first">W</forename><surname>Salusky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Danford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Honeypot Project</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Your botnet is my botnet: analysis of a botnet takeover</title>
		<author>
			<persName><forename type="first">B</forename><surname>Stone-Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cavallaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gilbert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Szydlowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kemmerer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kruegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Vigna</surname></persName>
		</author>
		<idno type="DOI">10.1145/1653662.1653738</idno>
		<idno>doi:</idno>
		<ptr target="10.1145/1653662.1653738" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th ACM conference on Computer and communications security -CCS &apos;09</title>
				<meeting>the 16th ACM conference on Computer and communications security -CCS &apos;09<address><addrLine>Chicago, Illinois, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">635</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Exposure: Finding malicious domains using passive dns analysis</title>
		<author>
			<persName><forename type="first">L</forename><surname>Bilge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kirda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kruegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Balduzzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Network and Distributed System Security Symposium (NDSS)</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">FANCI : Feature-based automated nxdomain classification and intelligence</title>
		<author>
			<persName><forename type="first">S</forename><surname>Schüppen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Teubert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Herrmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Meyer</surname></persName>
		</author>
		<ptr target="https://www.usenix.org/conference/usenixsecurity18/presentation/schuppen" />
	</analytic>
	<monogr>
		<title level="m">27th USENIX Security Symposium (USENIX Security 18)</title>
				<meeting><address><addrLine>Baltimore, MD</addrLine></address></meeting>
		<imprint>
			<publisher>USENIX Association</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1165" to="1181" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Themis: A Novel Detection Approach for Detecting Mixed Algorithmically Generated Domains</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Qiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="DOI">10.1109/MSN48538.2019.00057</idno>
	</analytic>
	<monogr>
		<title level="m">15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="259" to="264" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Phoenix: Dga-based botnet tracking and intelligence</title>
		<author>
			<persName><forename type="first">S</forename><surname>Schiavoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Maggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cavallaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zanero</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Detection of intrusions and malware, and vulnerability assessment</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="192" to="211" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Exposure: A passive dns analysis service to detect and report malicious domains</title>
		<author>
			<persName><forename type="first">L</forename><surname>Bilge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Balzarotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kirda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kruegel</surname></persName>
		</author>
		<idno type="DOI">10.1145/2584679</idno>
		<ptr target="https://doi.org/10.1145/2584679.doi:10.1145/2584679" />
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Inf. Syst. Secur</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">V</forename><surname>Mockapetris</surname></persName>
		</author>
		<ptr target="https://tools.ietf.org/html/rfc1035" />
		<title level="m">Domain names -implementation and specification</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">V</forename><surname>Mockapetris</surname></persName>
		</author>
		<ptr target="https://tools.ietf.org/html/rfc1034" />
		<title level="m">Domain names -concepts and facilities</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Miller</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.07269</idno>
		<idno>arXiv:</idno>
		<ptr target="1706.07269" />
		<title level="m">Explanation in Artificial Intelligence: Insights from the Social Sciences</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Dos and don&apos;ts of machine learning in computer security</title>
		<author>
			<persName><forename type="first">D</forename><surname>Arp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Quiring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pendlebury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Warnecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pierazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wressnegger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cavallaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rieck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st USENIX Security Symposium</title>
				<meeting>the 31st USENIX Security Symposium</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">5 Properties of Explanations | Interpretable Machine Learning</title>
		<author>
			<persName><forename type="first">C</forename><surname>Molnar</surname></persName>
		</author>
		<ptr target="https://christophm.github.io/interpretable-ml-book/properties.html" />
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.07874</idno>
		<idno>arXiv:</idno>
		<ptr target="1705.07874" />
		<title level="m">A Unified Approach to Interpreting Model Predictions</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Shapley Value</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hart</surname></persName>
		</author>
		<idno type="DOI">10.1057/978-1-349-95121-5_1369-2</idno>
		<ptr target="https://doi.org/10.1057/978-1-349-95121-5_1369-2.doi:10.1057/978-1-349-95121-5_1369-2" />
	</analytic>
	<monogr>
		<title level="m">The New Palgrave Dictionary of Economics</title>
				<meeting><address><addrLine>UK, London</addrLine></address></meeting>
		<imprint>
			<publisher>Palgrave Macmillan</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A comprehensive measurement study of domain generating malware</title>
		<author>
			<persName><forename type="first">D</forename><surname>Plohmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yakdan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Klatt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gerhards-Padilla</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">25th USENIX Security Symposium (USENIX Security 16)</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="263" to="278" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">seaborn: statistical data visualization</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Waskom</surname></persName>
		</author>
		<idno type="DOI">10.21105/joss.03021</idno>
		<ptr target="https://doi.org/10.21105/joss.03021.doi:10.21105/joss.03021" />
	</analytic>
	<monogr>
		<title level="j">Journal of Open Source Software</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">3021</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<ptr target="https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html" />
		<title level="m">An introduction to explainable AI with Shapley values -SHAP latest documentation</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">F</forename><surname>Daniel Plohmann</surname></persName>
		</author>
		<ptr target="https://dgarchive.caad.fkie.fraunhofer.de,????" />
		<title level="m">Dgarchive</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Identifying botnets using anomaly detection techniques applied to dns traffic</title>
		<author>
			<persName><forename type="first">R</forename><surname>Villamarin-Salomon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Brustoloni</surname></persName>
		</author>
		<idno type="DOI">10.1109/ccnc08.2007.112</idno>
	</analytic>
	<monogr>
		<title level="m">5th IEEE Consumer Communications and Networking Conference</title>
				<imprint>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page" from="476" to="481" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A survey of botnet detection based on DNS</title>
		<author>
			<persName><forename type="first">K</forename><surname>Alieyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Almomani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Manasrah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Kadhum</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00521-015-2128-0</idno>
		<ptr target="http://link.springer.com/10.1007/s00521-015-2128-0.doi:10.1007/s00521-015-2128-0" />
	</analytic>
	<monogr>
		<title level="j">Neural Computing and Applications</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="1541" to="1558" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The Role of DNS TTL Values in Potential DDoS Attacks: What Do the Major Banks Know About It?</title>
		<author>
			<persName><forename type="first">N</forename><surname>Vlajic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Andrade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.procs.2012.06.060</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S1877050912004176.doi:10.1016/j.procs.2012.06.060" />
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="466" to="473" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Building a dynamic reputation system for dns</title>
		<author>
			<persName><forename type="first">M</forename><surname>Antonakakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Perdisci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dagon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Feamster</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">USENIX security symposium</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="273" to="290" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">From throw-away traffic to bots: detecting the rise of dga-based malware</title>
		<author>
			<persName><forename type="first">M</forename><surname>Antonakakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Perdisci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nadji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vasiloglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Abu-Nimeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dagon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Presented as part of the 21st USENIX Security Symposium (USENIX Security 12)</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="491" to="506" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Interpretable visualizations of deep neural networks for domain generation algorithm detection</title>
		<author>
			<persName><forename type="first">F</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Drichel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ertl</surname></persName>
		</author>
		<idno type="DOI">10.1109/VizSec51108.2020.00010</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Visualization for Cyber Security (VizSec)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="25" to="29" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Analyzing the real-world applicability of dga classifiers</title>
		<author>
			<persName><forename type="first">A</forename><surname>Drichel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schüppen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Teubert</surname></persName>
		</author>
		<idno type="DOI">10.1145/3407023.3407030</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Conference on Availability, Reliability and Security</title>
				<meeting>the 15th International Conference on Availability, Reliability and Security</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">First step towards explainable dga multiclass classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Drichel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Faerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Meyer</surname></persName>
		</author>
		<idno type="DOI">10.1145/3465481.3465749</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Conference on Availability, Reliability and Security</title>
				<meeting>the 16th International Conference on Availability, Reliability and Security</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Why Should I Trust You?</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1602.04938[cs,stat</idno>
		<idno>arXiv:</idno>
		<ptr target="1602.04938version:1" />
	</analytic>
	<monogr>
		<title level="m">Explaining the Predictions of Any Classifier</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Stabilized-LIME for Model Explanation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hooker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S-Lime</forename></persName>
		</author>
		<idno type="DOI">10.1145/3447548.3467274</idno>
		<idno type="arXiv">arXiv:2106.07875</idno>
		<ptr target="http://arxiv.org/abs/2106.07875.doi:10.1145/3447548.3467274" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining</title>
				<meeting>the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2429" to="2438" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
