Analysing of M-AHIDS with future states on DARPA and KDD99 benchmarks Mikuláš Pataky and Damas P. Gruska Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovak Republic {pataky,gruska}@fmph.uniba.sk Abstract. Second generation of Multi-agent heterogeneous intrusion detection system (M-AHIDS) is a prototype proposed to detect untrusted and unusual network behaviour. The M-AHIDS is based on online traffic statistics in sFlow format acquired by network device with the sFlow agent and is able to perform a real-time surveillance of the 10 Gb net- works. However, after an immense reimplementation it is capable to pro- cess also offline data set from DARPA Intrusion Detection Evaluation Data Set and KDD99 Cup data set. Offline data sets are used for the correct comparison with another IDSs. The main contribution of the system is the integration of several anomaly detection techniques, new future state prognostic and new machinery of multi-agent temporal logic with hybrid argumentation. Every detection technique is represented by featuring a specific detection autonomous agent. At this stage, every agent determines the flow trustfulness from aggregated connection. The anomalies are used as an input for machinery of multi-agent temporal logic which is represented by the logical agent. M-AHIDS is already par- tially implemented, tested and modified accordingly for more than three years. 1 Introduction The number of users using internet and local networks is increasing every day. Consequently, there are many threats of trying to have an access to private pass- word, to data or to injure users by other ways. Fortunately, current generation of network devices allows a real-time scraping of structured snapshots of a traf- fic on the networks. This information is provided by various technologies. Two the mostly used technologies are the NetFlow format introduced by CISCO and the sFlow format. These technologies allow us to observe the individual flows on the network. A flow is an unidirectional component of TCP connection (or UDP/ICMP equivalent), defined as a set of packets with identical source and destination IP addresses, ports and protocol, packed size, MAC addresses, switch ports, flags and more. A piece of information provided by NetFlow or sFlow can be used to detect a network attack. The most frequent attacks on networks can be divided to three main classes [1]: Breaks privacy rules, compromising the information con- fidentiality; Alters information, compromising the data integrity; Denial of service attacks (DOS or DDOS attacks), which makes a network infrastructure unavailable or unreliable, compromising the availability of the resource. The protection of networks is, therefore, more than useful, if it is vital for long time. This issue requires monitoring of real distributed hosts, of various events and of exchanges between these hosts. Multi agent system (MAS) is very effective approach for this kind of problems as it can integrate many different techniques to one solution. The aim of this paper is to propose the second generation of multi-agent system for network intrusion detection M-AHIDS. The first generation was pre- sented in [2]. This generation is based on several years of experiences with devel- oping, improving, implementing, deploying and testing of M-AHIDS. The main contribution of the second generation of M-AHIDS is the integration of several anomaly detection techniques, new future state prognostic and new machinery of multi-agent temporal logic with hybrid negotiation based on argumentation. Every detection technique is represented by featuring a specific detection au- tonomous agent and every agent determines the flow trustworthiness from ag- gregated connection. Inspiration for our agents came from project CAMNEP [3, 4]. All CAMNEP agents are more or less separate IDS and the project CAM- NEP tries to connect their results to the more trustworthy results. But we have decided to use another approach in our IDS. Our agents are as simple as possible. We are also still improving our unique 1 Web agent. The web agent is based on our past project [5–7] about de-anonymization of an Internet user. This project has been deployed on all web pages of Comenius University for more than three years. We can detect ordinary users’ behaviour from its data. We used all the collected data for deep analysis and we created Web agent which is able to detect a trustworthy host based solely on his activity on the web pages. We have used another new approach for making decisions about intrusion from agent’s knowledge base detection. For this purpose we have used specifically developed multi-agent temporal logic (M-ATL). The anomalies are used as an input for machinery of M-ATL and the new version of hybrid argumentation which are represented by a logical agent. The logical agent is one of the system advantages because it has huge capabilities for making the right decision about the intrusions from detected anomalies. All detected intrusions are the past states in M-ATL and we are using newly implemented prediction methods base on regression models of time series for the future states. The regression models are used for computation of the future states from the collection of the past and the actual connections. The most important contributions of our research presented in this paper are THE FOLLOWING: Improving the integration of the several anomaly detection techniques in a form of an agent; Extension of machinery of the multi-agent temporal logic and hybrid negotiation about the future state; Major update of argumentation framework; Presenting new testing approach based on offline 1 with our best knowledge DARPA Intrusion Detection Evaluation Data Set and KDD99 Cup data set. M-AHIDS is partially implemented and tested on local network of Department of Applied Informatics. Results obtained on KDD99 are comparable to another IDS. The organization of the paper is as follows: in Section 2 – overview of the IDS and selected existing solutions and approaches; in Section 3 – proposal of detection system architecture; in Section 4 – detailed description of all agents in M-AHIDS; in Section 5 – overview of case study, tests and results. 2 Intrusion detection systems Intrusion Detection System or IDS is a software, hardware or combination of both used to detect an intruder’s activity. The base characteristics of IDS [8] are neutralizing illegal intrusion attempts in the real time. Consequently, it must be executed constantly in a host or in a network. There are many types of IDS and each of them has some advantages and disadvantages. Their strengths and weaknesses depend mostly on the way they recognize the threats. Two main approaches for detection intrusion are [1]: Behaviour-based intrusion detection approach discovers intrusive activity by comparing user’s or system’s behaviour profile with normal behaviour profile; Knowledge-based (signature-based) intrusion detection approach detects in- trusions upon a comparison between the parameters of users’ session and the known pattern attacks stored in a database. In recent years, several new approaches in IDS systems have been published. Certain approaches have been identified as relevant for our project. The first, multi-agent distributed IDS(DIDS) model based on the BP neural network adopts the modes of distributed detection and distributed response [9]. The sec- ond, emulation-based network intrusion detection systems have been devised to detect the presence of shellcode in the network traffic by trying to execute (portions of) the network packet payloads in an instrumented environment and checking the execution traces for signs of shellcode activity [10]. The fourth, multi-stage approach to constructing hierarchical classifiers that combines process mining, feature extraction based on temporal patterns and constructing classifiers based on a decision tree [11]. The fifth, content anomaly detection (CAD) models the payloads of traffic instead of the higher level attributes. Zero- day attacks then appear as outliers to the properly trained CAD sensors [12]. The sixth approach is to detect TCP connection based attacks using certain data mining algorithms[13]. J-48 decision tree algorithm and Nave Bayes classifiers were learnt on 19 selected features from KDD 99 dataset. The selected feature had been chosen by Markov blanket and Pearson correlation. The approach could detect about 74% of novel attacks with 19 features. 3 M-AHIDS The following section briefly proposes the foundations for the second generation network intrusion detection multi-agent system M-AHIDS. Design of the system arose from theoretical research as well as from practical experiences which have been already obtained by testing for more than three years. Collected Logical Java based decision In-memory Preprocessing applet fingerprints agent database PHP Network Results Database script administrator Collecting Cycle in- fingerprints memory Web Detection Detection JavaScript AJAX agent agent 1 agent n database De-anonymization database Flash script Generation Save of web links CSS history to Web server hosting the for history SESSION de-anonymization