Network Security Modelling with Distributional Data Subhabrata Majumdar1 , Ganesh Subramaniam2 1 AI Risk and Vulnerability Alliance, Seattle, WA, USA 2 AT&T Data Science and AI Research, Bedminster, NJ, USA Abstract We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data—the industry standard for monitoring of IP traffic—and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP- level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models are used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that enable modelling complex input feature spaces result in highly accurate predictions by our trained models. Keywords Cybersecurity, Netflow data, Botnet, Command & Control, Machine learning, quantiles 1. Introduction Security monitoring of Internet Protocol (IP) traffic is an important problem that is growing in promi- nence. An exploding volume of internet traffic and a wide variety of devices connecting to the internet in recent years have contributed to the increase in malicious activity that can harm both individual devices and carrier networks. Given the lasting damage of internet security breaches [1], it is important to monitor this IP traffic for malicious activity and flag anomalous external IP addresses that may be causing or directing this activity through communications with internal devices on a real time basis. There are a large number of challenging statistical problems in network security. For example, there is ongoing research on the identification of various malicious events like scanning, password guessing, Distributed Denial of Service (DDoS) attacks, malware injection, and different spams attacks [2]. The focus of this paper is on the detection of botnet attacks—specifically, identifying host IP addresses (also known as C2 or "Command and Control") that send instructions to infected bots (devices) on the nature of the attack to be perpetrated. Reviewing the literature in network security, we observed that the current trend of NetFlow analytics and ML modelling is device-centric, i.e., the analysis of the internet traffic routed through a device to determine whether it contains malicious activity. For example, Evangelou and Adams [3] used regression trees to model individual device behavior based on input features constructed from historic NetFlow data. In contrast, we perform a host-centric analysis, looking for host IPs (that devices are connecting to) acting in a possibly malicious way, particularly as the command and control server (C2) for a botnet. While a bot device may only have a small proportion of its traffic as malicious, the Host/C2 will have most of its traffic involved in the malicious activity, and therefore generate a stronger signature. Our analysis is aimed at looking for such signatures. Scanning the literature for methodology, recent papers have used supervised and unsupervised ML techniques for botnet detection. For example, Tegeler et al. [4] used flow-based methods to detect CAMLIS’22: Conference on Applied Machine Learning in Information Security (CAMLIS), October 20–21, 2022, Arlington, VA " zoom.subha@gmail.com (S. Majumdar); gs3185@att.com (G. Subramaniam) ~ https://subhomajumdar.com/ (S. Majumdar)  0000-0003-3529-7820 (S. Majumdar) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) botnets, Choi et al. [5] detected botnet traffic by capturing group activities in network traffic, and Karasaridis et al. [6] developed a K-means based method that employs scalable non-intrusive algorithms to analyze vast amounts of summary traffic data. However, a number of structural challenges increase the complexity of NetFlow data, calling for sophisticated ML techniques—based on statistical insights— for the analysis and modeling of such datasets. As examples, getting high-quality training data is a known problem. Some attempts—such as the CTU project [7]—have been made to obtain sample data for a number of known malwares. Being able to establish ground truth is also difficult. The process of confirming an IP address as a bad actor is expensive in terms of time and effort, and may even require manual review by a security analyst or deep packet inspection (DPI). Finally, it is challenging to capture the distributional nature of features in raw flow traffic data. For a device or IP-level analysis, the raw data may contain multiple records, giving a distribution of values for features such as number of packets or bytes transferred that needs to be succinctly summarized before applying downstream ML techniques. In past work, Gu et al. [8, 9, 10] used unsupervised methods on flow and some distributional features to detect botnets in NetFlow data, and developed a scalable framework to apply such methods as a filtering step before DPI [11]. In this paper, we present a statistical pipeline to model the IP network traffic for a given day using NetFlow data and to detect botnet attacks, while minimizing the need for expensive techniques such as DPI. We summarize the flow traffic feature distributions into carefully crafted IP-level feature vectors, then feed these into supervised ML models to predict whether an IP is malicious or benign. While Gu et al. [9] has used a similar featurization method in an unsupervised setting, we take a more principled approach, guided by empirical evidence of differences in flow feature distributions of traffic through C2 vs. benign hosts [12]. To this end, we experiment with a number of ensembling strategies to combine predictions from multiple models. The best performing models in our approach are able to accurately flag malicious IPs ahead of time. 2. Preliminaries We first introduce a few basic concepts, and give a high-level overview of our ML pipeline. 2.1. NetFlow Data NetFlow1 is a network protocol developed by Cisco for collecting, analyzing and monitoring of packet capture data. A fundamental tool for characterizing IP traffic, NetFlow data is comprised of source and destination IP addresses, packets and bytes transferred, and duration and IP protocol number used. While there are other components of IP traffic data—such as data from HTTP log files and DNS requests—NetFlow data is easily available, allowing analysts to create a data-driven funnel of highly probable IPs for further, more intensive investigation. To this end, two reasons make it necessary to extract and craft relevant features from NetFlow data. Firstly, NetFlow data is massive in scale: for a single day, the size of flow traffic data passing through a communication network may run into several hundreds of terabytes. Secondly, NetFlow data has a limited number of attributes, as shown in Table 2.1. These reasons predicate the need to extract and aggregate relevant statistical features for feeding into downstream investigations. 1 https://www.kentik.com/kentipedia/netflow-overview ice 1 Dev ice 2 NetFlow Data Fields Dev 1. Source IP address ice 3 2. Destination IP address Host IP: C2? 3. Source port Dev 4. Destination port 5. Bytes transferred ... ice 6. Packets transferred Dev 7. Start Time 8. End Time ice n 9. IP Protocol number Dev 10. Flag Table 1: NetFlow Data Figure 1: C2-Centric Traffic Flow 2.2. What is a Botnet? In recent years, botnets have emerged as one of the biggest threats to network security among all types of malware families, since they have the ability to constantly change their attack mechanism in scale and complexity [13]. A botnet is a network of compromised devices called bots and one or more Command & Control (C&C or C2) servers. Generally speaking, the bots could be a PC, a server, an Internet of Things (IoT) device or any machine with access to the internet. In this type of threat, the orchestrator—called the botmaster—authors a malware that operates on each bot. Devices are infected with the malware in several ways, such as “drive by downloads” which refers to the (unintentional) download of malware as a result of visiting a website or opening an infected email. The botnet control system, i.e. the C2 server, is the mechanism used by the botmaster to send commands and code updates to bots which then conduct the attacks. Due to the prevalence of firewalls, the botmaster cannot contact devices directly. Typically, the bot malware has instructions to contact the C2 to establish the communications and to receive instructions on any attacks to be perpetrated. The nature of such attacks vary in scale and sophistication. Examples of attacks by botnets include transmitting malware, using the bots to perform different illegal activities, e.g. spamming, phishing, or stealing confidential information, and orchestrating various network attacks (e.g. DDoS) [14]. 2.3. ML pipeline for botnet detection As noted in Section 1, the most common approach to identifying botnets is to look at individual devices and analyze their traffic with various hosts. This means analyzing each of the device’s connections, as shown in the right panel of Figure 2.1, for possible malicious traffic. However, the connection traffic between a bot device and the C2 may not look significantly different than other benign traffic for that device, and/or or be a small portion of its traffic. Therefore, each connection must be analyzed individually. In this paper, we take a C2-centric view of the data instead, analyzing the external host for C2 behavior. Figure 2.1 shows the traffic between one external host IP address and several devices internal to a carrier network, each having a distinct IP address. We aggregate device traffic for each such (external) host IP address, and use this aggregate data to answer the question: which host IP addresses have traffic that looks like a botnet command and control (C2) pattern? Assuming that a C2 server aims to control a large number of bot devices, such controlling behavior will manifest in the interaction data between a C2 server and its paired devices. Therefore, we can look for the C2 signature as the predominant Figure 2: The ML Botnet Pipeline traffic pattern over all the paired devices. This C2-centric approach allows for richer aggregation, and fewer samples that need to be analyzed, thereby improving the accuracy and scalability of the resulting detection. We construct features for each host IP from the NetFlow data traffic between the host and all of its associated device IPs (see right panel in Figure 2.1), then train a machine learning (ML) models using the constructed features to predict a Host IP address as malicious or benign. Figure 2 shows a simplified view of our pipeline. The threat platform monitoring activity in an internet network ingests NetFlow data from multiple traffic domains on a daily basis. We use ML models trained on historical data to predict the label for external host IP addresses. These models detect IP addresses associated with Botnet C2, Trojans, and other dangerous malware families. As discussed in Section 1, a major challenge in network security is knowledge of ground truth, i.e. proof that IP addresses predicted by the ML model as malicious are indeed malicious Botents. A naïve approach would be to see if these predicted IPs show up in existing lists of malicious IPs in threat platforms, i.e. denylists. The problem with this approach is that the sources of denylists are diverse, ranging from crowdsourced information to analysis of malware samples. Such sources are of variable and not always reliable quality. Due to this limitation, we take a multi-tiered approach in validating the model-generated list of potentially malicious IPs. We first use the network denylists to filter out any known malicious IPs, then pass the remaining list of IPs to a validation engine. The validation engine mechanizes rule-based review processes typically done manually by security analysts. The shorter list passed on from the validation engine finally goes through deep packet inspection (DPI)—the most definitive validation process—to see if these IPs generate alerts associated with well known vulnerabilities (IPs subject to legal review). The final list of IPs coming out of this process tagged as malicious are considered as ‘actionable’ IPs, ready to be utilized in future security use cases. The above three-step process not only saves resources by successively filtering an initial list of suspicious IPs before using the expensive DPI validation technique, but also eliminates possible ‘false positives’—IPs that may either be allowlisted or sinkholes, or belong to well known content delivery networks and cloud providers. 3. Materials and Methods In this section, we give details of the NetFlow data analysis and model building. We start with feature engineering, then describe the modelling steps. 3.1. Data We used daily flow data processed from numerous traffic domains associated with several classes of service for a telecommunications solution provider. Based on the matches from the native threat intelligence platform that maintains a list of confirmed IP addresses belonging to several malicious botnet families, some IP addresses are labeled ‘malicious’, and others as ‘unknown’. To construct the above list, active malware sample traffic traces observed in the network within past 30 days were used. As the list of IP addresses associated with the malicious families is very small compared to the entire traffic, there was a significant imbalance of class labels. To mitigate the class imbalance, we sampled 1000 IP addresses from the ‘unknown’ class for every day of the month of December 2021, took all IP addresses associated with the ‘malicious’ class, and used the traffic flowing through these IPs to construct our training dataset. This hand constructed training data had ∼17% malicious and 83% unknown traffic. All traffic from the subsequent month (January 2022) were used as the test dataset. 3.2. Feature engineering The first important step for building a predictive model for detecting botnets is the exploration of the input feature space. We need to craft input features that enable sufficient description of the NetFlow traffic to distinguish between malicious and benign IPs. The ability of the engineered feature space to provide pertinent information is critical to the subsequent ML step, as the underlying assumption of the classification models used is that feature characterization of the malicious botnet and benign NetFlow traffic have different distributions. In previous exploratory work on NetFlow data [12], we discovered traffic traces associated with known botnet families, in other words ‘live’ botnet traffic, i.e. C2 IP addresses. Such IP addresses are called malware samples. Using the flow data of IP addresses from these malware samples we uncovered a number of main characteristics or signatures that differentiate normal traffic from botnet traffic. Subramaniam et al. [12] presented a comprehensive discussion of feature engineering for NetFlow data to help build an informative feature set for botnet prediction using ML. This set of features can be categorized into two majot groups: (1) flow size features, (2) beaconing features. In addition to them, in this paper we also use distributional features for flow variables to encode granular information on flow variable distributions. 3.2.1. Flow size features The first set of statistical features engineered from cthe NetFlow data are based on flow sizes, that indicate the total number of bytes/packets transferred between the source and the destination endpoints for a given flow. In our case, this is the traffic between a single source IP (SIP) and all the devices it communicates with, i.e., aggregate traffic between a SIP and all devices. Our exploratory analysis of flow sizes using live botnet data indicate the flow size characteristics for C2 servers are significantly different from flow size characteristics for benign servers. The differences can be attributed to several factors, the main one being the botnet traffic tries to maintain a low profile to avoid detection. As a result, botnet flow feature values are usually small, and have minimum variation across time. In contrast, benign flows show more diversity in flow sizes, i.e., assume a wide range of values. Other statistical features we use comprise of bytes, packets, duration, bytes to packet ratio, byte and packet rates. Finally, it is possible to infer who initiated the connection–the external host IP or the device—using port information. Thus we include one-hot encoded port indicators as input features. 3.2.2. Beaconing features Malware downloaded by compromised internal devices or servers displays beaconing behavior, which involves sending short and routine communications to the C2 server. Beaconing signals that the infected device in the internal network is now available and listening to the C2 server for further instructions. We developed a number of features to specifically detect the presence of beaconing activity to confirm that the signaling is active. As an example, from the observed sequence of source IP start times, the inter-arrival times are defined as the differences between start times of successive flows. If inter-arrival times display a periodic pattern, then beaconing signal is present. If they are random, then beaconing signal is not present. Based on such logics, several statistics computed based on the set of inter-arrival times form the basis of beaconing features. 3.2.3. Distributional features As indicated earlier, the rationale for our IP-level analysis is the hypothesis that C2 servers demonstrate markedly different flow behavior compared to benign IP addresses. Translating to statistical terms, this means that the distributions of flow features for the two classes are very different. Only using static summary statistics of these distributions such as mean, median, or standard deviation (as used in the engineered flow and beaconing features above) may not be sufficient to optimally tell apart malicious and benign IPs. Because of this reason, we craft an additional number of features, from quantiles of IP-level raw flow feature distributions. As an example, consider the three input features such as packets, bytes, packets-to-bytes ratios have multiple observations per IP. Denote their distributions for a device as 𝒟𝑝 , 𝒟𝑏 , 𝒟𝑟 ∈ P, respectively. Here P is the set of all real-valued probability distributions. Assume that whether an IP is malicious or not is a function of these distributions: I(malicious) = 𝑓 (𝒟𝑝 , 𝒟𝑏 , 𝒟𝑟 ). This model may be approximated using summary statistics such as mean 𝜇(·) and standard deviation 𝜎(·) of an IP-level feature distribution: I(malicious) ≃ 𝑓 ((𝜇(𝒟𝑝 ), 𝜎(𝒟𝑝 )), (𝜇(𝒟𝑏 ), 𝜎(𝒟𝑏 )), (𝜇(𝒟𝑟 ), 𝜎(𝒟𝑟 )). In addition to the above somewhat simplistic feature summaries, we use a wider spectrum of distribu- tional features, obtained using quantiles of each feature distribution: I(malicious) ≃ 𝑓 (𝐺(𝒟𝑝 ), 𝐺(𝒟𝑏 ), 𝐺(𝒟𝑟 )), where 𝐺 ≡ (𝜇, 𝜎, 𝑄); 𝑄 : P ↦→ R𝑛 indicating the vector transformation giving 𝑛 pre-defined quantiles from a distribution, While the transformation 𝑄(·) can be made as arbitrarily high-dimensional by taking closely situated quantiles, we found that for our dataset, model performances plateau at less than than 5% granularity of quantiles. Consequently, we set 𝑛 = 20, i.e. consider 5%, 10%, . . . , 95%, 100% quantiles of the respective flow feature distribution. Generally, there is significant overlap in feature-level summary statistics across malicious and benign IPs. Using a larger number of quantiles that adapt to the shape of a distribution allows us to tease out the differences between these two classes more accurately. Moreover, some summary statistics such as standard deviation require a large enough sample size for the calculated value to be usable. As a result, IPs with smaller number of observations may be dropped from the analysis and/or modeled inaccurately. Quantile-based features do not have this limitation. Figure 3: Comparison of static (top) vs. quantile (bottom) features for packets-to-bytes ratio. As an example, consider the plot of the average and standard deviations of packets-to-bytes ratio for IPs that are known to be malicious, vs. IPs with unknown status, and compare it with the plot of IP-level deciles (quantiles at 10% intervals) of the same feature (Figure 3). Even though values in the two classes intersect heavily for the static summary statistics, quantile features are different for samples across the classes in lower and higher quantiles. 3.3. Ensemble Models for NetFlow Data ML techniques—both supervised and unsupervised—are widely used in cybersecurity. Typically, unsu- pervised techniques are used on known malware samples (e.g., live botnet traffic) to explore statistical features of the malware. Supervised ML models are ideally trained on high-quality data containing reliable labels (malicious vs benign). The general principle is that as the training data is labeled, the model will ‘learn’ from the labeled patterns to build the classifier which the can be used it to predict class labels for IPs in new traffic data. Many researchers have employed such models for botnet detection. Here the traffic of IP addresses associated with known set of botnet families are used as training data to learn a prediction function that classify an IP address as botnet C2 server or benign. For example, Subramaniam et al. [12] built predictive models using Random Forest and deep learning techniques on flow size and beaconing features and demonstrated the predictive performance for these models for a one month period. In this paper, we investigate two novel ideas: 1. The use of distributional features in the models in addition to the traditional flow features, 2. The use of ensemble methods [15, 16] by combining the predictions of multiple (weak) prediction methods (bucket of models). Ensemble models combine or stack several base models together. They are aimed at maximizing the contribution from diverse models to get a wider understanding of the class-distinguishing input features, especially for complex datasets. In theory, ensemble models can improve both the accuracy and stability of predictions over individual models by taking advantage of the underlying differences and strengths of the base models. Ensemble methods are used extensively in other fields like medicine (e.g. MRI datasets [17]), finance (fraud detection [18]), image analysis (Face recognition [19]) and meteorology [16], to name a few. We use a number of ensembling starategies. Firstly, we use ML models that are ensembles by definition: random forest [20, RF] on the base features, RF on PCA-transformed input features, and two versions of boosting methods [21]: gradient boosting and XGBoost. Secondly, we use two simpler approaches: Logistic regression (linear classifier), LASSO (regularization), and stack all six of our models using a GLM-based ensembling strategy [22]. Thus, we cover different types of ensembling strategies: parallel combination (RF models), serial combination (boosting models), and a stacked combination of all models. 4. Evaluation We did all analyses using the R statistical software, utilizing the packages caret [23] and caretEnsemble [22], and a 10 fold cross-validation for hyperparameter tuning. For model performance evaluation, we obtain boxplots of performance metrics using bootstrapped samples from the test data (resample size 1000). Figure 4 presents two performance metrics—Area Under the ROC curve (AUC) and sensitivity—for each of our six base models. We compare two feature sets for model evaluation: the conventional flow features related to flow size and beaconing, without and with our novel distributional features for bytes, packets, and bytes to packets ratio. Random forest and the boosting methods performed well—XGBoost being the fastest in terms of computation time as well—whereas the linear models performed poorly. Random forest on PCA transformed input features did better than the GLMs. We generally see the inclusion of quantile-based flow features improving model performance for both metrics—the effect being stronger for AUC. The positive effect of distributional features is pronounced across both the metrics on the three best performing models: RF, GBM, and XGBoost. Finally, stacking the predictions from the different classifiers using a simple linear model, we get an improvement in the performance metrics: the average AUC across bootstrapped test sets for the stacked ensemble of all 6 models was 0.95, and the sensitivity was 0.83. The AUC is comparable to the better performing (RF and boosting) models, while sensitivity is not as good, potentially because of the simpler models (GLM, Lasso, pcaRF) having lower values for this metric. To understand what features are important behind behind the classification of IP addresses, we look at the variable importance plot of the all-features random forest model (Figure 5). The fact that all the top important features belong to the quantile feature set underline their informativeness in our IP classification scenario. We also observe that most of these top 20 quantiles are correspond to either of the tails: 14 of the 20 quantiles lie outside the Inter-quartile range, i.e. 25th and 75th quantiles. 5. Conclusion Detecting botnet and other malware activity is a challenging task even with good quality labeled data sets. The framework used in this paper, namely utilizing a combination of distributional characteristics of NetFlow variables in conjunction with the stacking of multiple ML models provides a useful strategy for detecting malicious activity in IP traffic. The advantages of distributional variables are two fold: the ease of computing them versus traditional NetFlow variables, and the flexibility of choosing more or less quantiles based on computational constraints. Some NetFlow features involve computing volumes in both directions, i.e., originating and terminating directions. These are computationally expensive. The ensemble method using GLM, which is used in the context of botnet detection for the first time improved the accuracy performance and provided stability for the predictions as well. Results confirm that the Super Learner (stacked model) provides better accuracy than any of the individual models. In future work, deep learning methods needs to be evaluated in comparison with ‘traditional’ ML models for the current task. Further investigations are necessary to determine the performance of more complex ensembling methods, such as Bayesian model averaging. Our current labeled data contains GBM GLM LASSO 0.65 0.650 0.915 0.645 0.60 0.910 0.640 0.55 Area Under Curve 0.905 0.635 0.50 Flow + dist Pure flow Flow + dist Pure flow Flow + dist Pure flow pcaRF RF XGBOOST 0.81 0.930 0.80 0.940 0.79 0.925 0.935 0.78 0.920 0.77 0.930 0.76 0.915 Flow + dist Pure flow Flow + dist Pure flow Flow + dist Pure flow Feature set Flow + dist Pure flow GBM GLM LASSO 1.0 0.72 0.89 0.8 0.71 0.88 0.6 0.70 0.87 0.4 0.69 0.2 Sensitivity 0.68 Flow + dist Pure flow Flow + dist Pure flow Flow + dist Pure flow pcaRF RF XGBOOST 0.81 0.90 0.93 0.80 0.92 0.89 0.79 0.91 0.88 0.78 0.90 0.87 0.77 Flow + dist Pure flow Flow + dist Pure flow Flow + dist Pure flow Feature set Flow + dist Pure flow Figure 4: Comparison of model performance using AUC (top) and sensitivity (bottom). a diverse mix of various malware families. It would be of interest to train separate ML models for specific families of malware samples and identify family-specific flow features instrumental behind the respective prediction models. Finally, existing research on adversarial tactics to fool statistical malware detection methods based on flow data [24, 25] provide motivation to perform similar analyses of our featurzation technique and devising predictive models robust to such adversarial attacks. 100 100 75 79.6 79.2 79.1 77.6 74.1 73.7 73.3 Importance 69.8 69.6 67.7 67.2 66.9 66.3 66 65.9 65.8 65.2 64.8 64.5 50 25 0 By 20 By q5 q4 Pk q1 By 21 By _q3 Pk 12 19 By _q2 0 q4 1 7 q2 By q10 1 5 q6 10 21 q2 q2 _q q1 _q _q s_ s_ _ _q q _q _ s_ _ _q _q io s s_ io s_ io s_ io s_ s_ io io te te te te ts ts ts io io at at at at at at te te te te te Feature Pk By at at R R R R R R By By R R Figure 5: Top 20 important features in the all-feature RF model. Variable importances are in percentile scale. 6. Acknowledgement We thank Robert Archibald, Richard Hellstern, and Craig Nohl (AT&T) for providing advice, helpful comments and technical expertise related to Network Security. References [1] L. Cheng, F. Liu, D. D. Yao, Enterprise data breach: causes, challenges, prevention, and future directions, WIREs Data Mining and Knowledge Discovery 7 (2017) e1211. [2] A. Handa, A. Sharma, S. K. Shukla, Machine learning in cybersecurity: A review, WIREs Data Mining and Knowledge Discovery 9 (2019) e1306. [3] M. Evangelou, N. Adams, Predictability of netflow data, 2016 IEEE Conference on Intelligence and Security Informatics (ISI) (2016) 67–72. [4] F. Tegeler, X. Fu, G. Vigna, C. Kruegel, Botfinder: Finding bots in network traffic without deep packet inspection, in: Proceedings of the 8th international conference on Emerging networking experiments and technologies, 2012, pp. 349–360. [5] H. Choi, H. Lee, H. Kim, Botgad: detecting botnets by capturing group activities in network traffic, in: Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE, 2009, pp. 1–8. [6] A. Karasaridis, B. Rexroad, D. A. Hoeflin, et al., Wide-scale botnet detection and characterization, HotBots 7 (2007) 7–7. [7] Garcia, Grill, Stiborek, Zunino, An empirical comparison of botnet detection methods, Computers and Security Journal 45 (2014) 100–123. [8] G. Gu, et al., Bothunter: Detecting malware infection through ids-driven dialog correlation, in: USENIX Security Symposium, volume 7, 2007. [9] G. Gu, R. Perdisci, J. Zhang, W. Lee, Botminer: Clustering analysis of network traffic for protocol- and structure-independent botnet detection, in: Proceedings of the 17th Conference on Security Symposium, SS’08, USENIX Association, USA, 2008, p. 139–154. [10] G. Gu, et al., Botsniffer: Detecting botnet command and control channels in network traffic, in: NDSS, 2008. [11] J. Zhang, X. Luo, R. Perdisci, G. Gu, W. Lee, N. Feamster, Boosting the scalability of botnet detection using adaptive traffic sampling, in: Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, ASIACCS ’11, Association for Computing Machinery, New York, NY, USA, 2011, p. 124–134. [12] G. Subramaniam, H. Chen, R. Varadhan, R. Archibald, Network security modeling using netflow data: Detecting botnet attacks in ip traffic, 2021. [13] S. S. Silva, R. M. Silva, R. C. Pinto, R. M. Salles, Botnets: A survey, Computer Networks 57 (2013) 378–403. [14] T. Vu, S. Nam, M. Stege, et al., A survey on botnets: Incentives, evolution, detection and current trends, Future Internet 13 (2021). [15] D. Opitz, R. Maclin, Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research (1999) 169–198. doi:10.1613/jair.614. [16] T. Gneiting, A. E. Raftery, Weather forecasting with ensemble methods, Science 310 (2005) 248–249. [17] M. T. El-Melegy, K. M. Abo El-Magd, S. A. Ali, K. F. Hussain, Y. B. Mahdy, Ensemble of multiple classifiers for automatic multimodal brain tumor segmentation, in: 2019 International Conference on Innovative Trends in Computer Engineering (ITCE), 2019, pp. 58–63. [18] S. Bagga, A. Goyal, N. Gupta, A. Goyal, Credit card fraud detection using pipeling and ensemble learning, Procedia Computer Science 173 (2020) 104–112. International Conference on Smart Sustainable Intelligent Computing and Applications under ICITETM2020. [19] K. Li, L. Wang, Ensemble methods of face recognition based on bit-plane decomposition, in: 2009 International Conference on Computational Intelligence and Natural Computing, volume 1, 2009, pp. 194–197. [20] L. Breiman, Random forests, Machine learning 45 (2001) 5–32. [21] J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics (2001) 1189–1232. [22] Z. Mayer, A Brief Introduction to caretEnsemble, 2019. URL: https://cran.r-project.org/web/ packages/caretEnsemble/vignettes/caretEnsemble-intro.html. [23] M. Kuhn, Building predictive models in r using the caret package, Journal of Statistical Software, Articles 28 (2008) 1–26. [24] M. Rigaki, S. Garcia, Bringing a gan to a knife-fight: Adapting malware communication to avoid detection, in: 2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 70–75. [25] C. V. Wright, S. E. Coull, F. Monrose, Traffic morphing: An efficient defense against statistical traffic analysis, in: Proceedings of the Network and Distributed System Security Symposium, NDSS 2009, San Diego, California, USA, 8th February - 11th February 2009, The Internet Society, 2009.