=Paper=
{{Paper
|id=Vol-3341/wm9388
|storemode=property
|title=Multi-Agent Case-Based Reasoning: a Network Intrusion Detection System
|pdfUrl=https://ceur-ws.org/Vol-3341/WM-LWDA_2022_CRC_9388.pdf
|volume=Vol-3341
|authors=Jakob Michael Schoenborn,Klaus-Dieter Althoff
|dblpUrl=https://dblp.org/rec/conf/lwa/SchoenbornA22
}}
==Multi-Agent Case-Based Reasoning: a Network Intrusion Detection System==
<pdf width="1500px">https://ceur-ws.org/Vol-3341/WM-LWDA_2022_CRC_9388.pdf</pdf>
<pre>
Multi-Agent Case-Based Reasoning: a Network
Intrusion Detection System
Jakob Michael Schoenborn1,2 , Klaus-Dieter Althoff1,2
1
    University of Hildesheim, Samelsonplatz 1, 31141 Hildesheim, Germany
2
    German Research Center for Artificial Intelligence (DFKI), Trippstadter Str. 122, 67663 Kaiserslautern, Germany


                                         Abstract
                                         We propose a multi-agent case-based reasoning system to detect malicious traffic in a network. We
                                         introduce ten topic agents, including nine different attack categories and one agent covering normal,
                                         benign traffic. Using the four knowledge containers, we fill our case base with the labeled training data
                                         of the commonly used UNSW_NB15 data set, in sum 82332 cases with (mostly numeric) 47 attribute
                                         features. We calculate average values for each attribute and search for outliers to identify characteristic
                                         attributes for each attack category, increasing weights in the amalgamation function for those attributes.
                                         For local similarities, we define polynomial similarity functions with heavily decreasing similarity for
                                         differing attribute-values pairs between case and query, depending on the range of the attribute values.
                                         Purpose. The proposed system is aimed to detect malicious traffic, such as denial of service attacks,
                                         to alert the security engineer of a company or even an individual person. The system can either be
                                         included into already existing intrusion detection systems, support regular log analysis, or used for
                                         forensic analysis.
                                         Findings. We were able to successfully detect Generic and Fuzzer attacks with a high true-positive rate.
                                         With additional adjustments, we are confident to successfully detect more attack categories.
                                         Implications and value. Despite only detecting two out of nine attacks, we are confident for this approach
                                         to provide an important step into the right direction with possible improvements and opportunities for
                                         fruitful synergies and discussions inside the security domain and case-based reasoning community.

                                         Keywords
                                         Case-based Reasoning, Intrusion Detection System, IT-Security, Multi-agent system, SEASALT, myCBR


1. Introduction
Intellectual property needs to be protected from unauthorized access. With the increasing
amount of companies going online through industry 4.0 standards for higher globalization,
the amount of cybercrimes increases as well. Especially due to the COVID-19 pandemic, most
organizations were furthermore forced into digitisation, for example allowing their employees
to establish a home office. Properly integrating digitisation into a company environment
takes time and effort. Unfortunately, security is not the top priority during this phase. This
development offers a wider range of possible targets for cybercriminals to move through a
companies’ network. It is not uncommon for companies to have multiple thousands of known
vulnerability issues - especially for larger IT companies. However, the upper management

LWDA’22: Lernen, Wissen, Daten, Analysen. October 05–07, 2022, Hildesheim, Germany
Envelope-Open schoenb@uni-hildesheim.de (J. M. Schoenborn)
GLOBE https://github.com/jmschoenborn (J. M. Schoenborn)
Orcid 0000-0001-9669-8148 (J. M. Schoenborn)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
might only provide budget to fix a smaller percentage of those issues, since investment into
security does not necessarily directly create visible value. Certainly, there are different scoring
systems to rate the severance of a given security issue, such as CVSS1 . Using scores, the most
severe security issues can be identified and ranked. However, the given budget might not be
enough to fix all of those, leaving the choice which issues to be fixed to the security engineers.
On the one hand, it might be a reasonable approach to fix SQL injections (SQLi) and Cross Site
Scripting (XSS) issues, as they might cover the most percentage of the critical findings2 . On the
other hand, leaving certain known other vulnerabilities open might be very risky, resulting in
an economical shut down of a company. To ease the difficulty in the decision making process,
more information about the potential attacker could be helpful.
   This contribution is the first step in a series of multiple necessarily required steps to reach
the previously described goal of identifying possible attackers, alongside with explainability
of the given decision. Based on the current state of the art, case-based reasoning (CBR) seems
to be a promising candidate as an addition to current intrusion detection systems: a relatively
small casebase can already be sufficient for detecting novel attacks, which are usually just
slightly adjusted from known attacks. CBR is a methodology usually cycling through four
steps: retrieve, reuse, revise, retain. Knowledge representation is supported by four knowledge
containers: vocabulary, similarity measures, adaptation knowledge, case base. Generally, CBR
follows the paradigm “Similar problems have similar solutions.”, thus, retrieving experience
from old situations (cases) to solve a new occurring problem. The possibility of modularization
by using the SEASALT architecture [1] and initializing CBR agents whenever needed is an
advantage to adjust resources accordingly to incoming attacks - the more incoming data and
attacks, the more agents can be initialized.
   Nevertheless, unfortunately, there is only very limited research inside of the CBR community
towards this area, whereas security is a domain, which affects everyone of us - not only
companies, but also us as individuals. A recent study of the International Telecommunication
Union reports a surge of 800 million users on the internet from 2019 to 2021, which results in
4,9 billion - or 63 % of the world population - total users, with the trend still going upwards [2].
We are confident in CBR to be a valuable addition to existing security mechanisms to protect
these users, in addition to persons who do not use the internet but are indirectly dependant of,
for example, their bank having proper security mechanisms.
   The next section describes related work to position our contribution within other approaches
in a similar direction. We present our concept and the used data set in section 3, followed by a
description of the practical implementation in section 4. Closing, we evaluate and discuss our
results in section 5 and 6.


    1
        Common Vulnerability Scoring System, see https://nvd.nist.gov/vuln-metrics/cvss, last validation: 03/18/2022
    2
        see https://owasp.org/www-project-top-ten/, last validation 03/18/2022
2. Related Work
As the meaning of the term security spans over various areas, we used a different combination
of multiple keywords such as case-based reasoning, security, intrusion, detection, system, network,
malware, malicious, traffic, ... to receive a competitive overview on the current state of the art
regarding the usage of case-based reasoning in the IT-security domain. In the following, we
present the most similar literature that we could identify in comparison to our case, sorted into
the categories Profiling, General Intrusion Detection and Attack-centric research, which all also
apply to our contribution.
   Profiling. Regarding to our long-term goal in matching security issues to potential attacker
groups, Kapatenakis et al. started 2014 by ‘examining whether a CBR approach can help security
and forensic investigators to profile human attackers with regards to their behavioural, demographic
and technical characteristics’ [3]. This data has been used to formulate cases of an attacker
and also cases for attacks. Within their experiments including 87 individuals to shut down
targeted systems, they received an average classification rate of 69 %. Han et al. developed a tool
called ‘Web-Hacking Profiling using CBR’ [4]. The mostly South Korean authors investigated
attacks suspected from North Korean actors and claim to have found evidence between multiple
attacks with the same signatures, assigned to North Korea. However, there are only very few
details about the implementation of the system and not open source - the realization is at least
questionable in multiple regards.
   General intrusion detection. El Ajjouri et al. suggest a case retrieval implementation for
intrusion detection based on multi-agent case-based reasoning [5], using jColibri and begin with
an initial set of five intrusion cases. Similar to Han et al., the case structure focuses primarily
on the protocol, IP addresses, and packets contents. Here, it should be noted that over time,
different actors may have the same IP address3 . In this paper, “agents” are rather focusing on
different tasks in the overall system (sniffer-, preprocessor-, filter-, decider-, generator-, CBR
agent), than focusing on the attacks itself - resulting in a more static distributed problem solving
than a multi-agent system. Additionally, the weight for each attribute is the same and each
attribute is checked whether the value for the case and the query are equal or not, missing
the finesse of CBR. Erbacher and Hutchinson are providing a CBR system for automated cyber
security report generation, in addition to reporting hostile actors, which are trying to take
actions to hide from established defensive measurements [6]. Wand et al. apply graph theory
and case templates to represent vulnerabilities and reason the attack paths using the graph
structures [7]. Creating this structure also allows to assess the current security situation based
in the network [7].
   Attack-centric research. Long et al. focused on the detection of distributed denial of service
(DDoS) attacks [8], similar to our DoS agent. The authors are using multi-sensor data, which is
included in two different DARPA data sets and their results show that their CBR approach is
effective - which could also be integrated into our proposed system. Slightly out of our scope,
but still relevant: Abutair and Belghith provide a CBR system to detect at least 96 % of phishing
emails [9]. Emails, which try to trick the receiver to click a malicious link and thus installing
malware on their computers. Phishing emails are a subcategory of social engineering, which
Lansley et al. [10] focused on and received similar results to Abutair and Belghith.
    3
        IPv6 will strongly lessen the likelihood for this case to happen.
3. Concept
3.1. Dataset
To choose a data set, Ring et al. provided a survey of 34 network-based intrusion detection
data sets [11]. Their general recommendation mentions four possible data sets whereas the
CICIDS 2017 and UNSW_NB15 offer a wide range of attack scenarios [11], which fits to our
needs. While the former contains more detailed metadata, we chose the latter as CBR does not
necessarily need a large amount of data to provide proper classifications.
   UNSW_NB15 has been created by N. Moustafa and J. Slay and spans over 47 different attributes,
which can be sub-categorized into basic features, connection features, content features, time
features, additional generated features, and labeled features. The attack categories are labeled
as 1, while normal traffic is labeled as 0 and are described in Table 1 [12].
   The dataset is split into training data (82332 packages, thus, in sum 82332 cases) and testing
data (175341 packages). For a detailed description of the 47 features, we refer to the original
publication by Moustafa and Slay [12]. Table 2 shows an excerpt of the calculated average
values of the UNSW_NB15 training data set.

       Fuzzers       attacker attempts to discover security loopholes in a network by feeding it
                     with massive inputting of random data to make it crash.
      Analysis       a type of variety intrusions that penetrate the web applications via ports,
                     emails, and web scripts.
      Backdoor       a technique of bypassing a stealthy normal authentication, securing unautho-
                     rized remote access to a device.
        DoS          intrusion which disrupts the computer resources, to be extremely busy in
                     order to prevent the authorized requests from accessing a device.
       Exploit       a sequence of instructions that takes advantage of a vulnerability to be caused
                     by an unintentional behavior on a host or network.
       Generic       technique that establishes against every block-cipher to collision without
                     respect to the configuration of the block-cipher.
   Reconnaissance    can be defined as a probe; an attack that gathers information about a computer
                     network to evade its security controls.
      Shellcode      an attack in which the attacker penetrates a slight piece of code starting from
                     a shell to control the compromised machine.
       Worms         an attack whereby the attacker replicates itself in order to spread on other
                     computers. Often, it uses a computer network to spread itself.
Table 1
Description of the attack categories based on Moustafa and Slay [12]
                                                 .


3.2. CBR agents and their knowledge containers
Observing network traffic to identify a possible attacker can be a tedious task as thousands of
packages per second are rushing through the network. Therefore, in terms of efficiency and
scalability, we suggest a multi-agent system with at least one agent per attack category and one
agent for normal data traffic. With the given data set, this leaves us with ten agents - with the
        Attribute        Overall       Backd.        Fuzzer        Generic       Shellc.        Worm
        protocol       tcp(43095)    unas(206)     tcp(3713)     udp(18303)     tcp(139)        tcp(38)
          state       FIN(39339)      INT(522)     FIN(3703)     INT(18325)    FIN(192)        FIN(38)
        duration            1.01         0,93          2,13           0,07         0,36           1,06
         sbytes          7994,44       581,32        5197,3         552,8        542,11         1819,5
         dbytes         13235,35       168,13        513,29        1830,8        149,58       68940,28
            sttl          180,97        248,1        253,98         251,64         254            254
            dttl           95,72        22,67        154,36           7,04       128,67         217,64
           sloss            4,76         0,22          3,41          0,22         1,02            1,82
          dloss             6,31         0,22          1,18          0,74         0,52           26,78
         service        -(47153)        -(572)       -(5527)     dns(18162)      -(378)        http(34)
          sload        65447 384     123637 232   108336 848      92705 224   122868 016      69576 520
          dload        630569,07       1694,86      3372,86        6044,48      2580,43       141641,49
          spkts            18,67         4,39          11,8            2,8        6,07           16,78
          dpkts            17,55         0,84           5,8          1,64         3,36           57,32
           rate           82403       154631,5      67530,1      195076,07     28347,25        20921,27
           swin           133,46        22,31        156,19           7,03        130,2         220,23
          dwin           128,29         22,31        156,19           7,03        130,2         220,23
          stcpb       1084642 816    185336 576   1340419 072     60907 388   1103935 232    1738327 296
          dtcpb       1073468 224    195774 848   1299013 760     59711 008   1119258 880    1699302 272
         smean            139,53         98,9       214,88             65        123,22         186,91
         dmean            116,28         9,66          36,5          12,35        22,84         238,28
      trans_depth           0,1          0,02          0,03          0,01            0           0,62
        response         1595,34         0,39          4,73        191,85            0         24569,3
            sjit          636,04        464,01     26552,59         175,94       2299,7        2969,66
            djit          535,18        13,14       1072,12          49,35        85,47         343,46
         sinpkt           755,39        38,25        378,72            3,1        37,96          53,52
         dinpkt          121,71          7,47       402,84            2,56        52,73          66,38
          tcprtt           0,06          0,01           0,1          0,01         0,06           0,12
         synack             0,03         0,01          0,05          0,01         0,03           0,07
         ackdat             0,03         0,01          0,05          0,01         0,04           0,06
       ct_srv_src           9,55        10,91          7,79          23,1           2,6           1,69
      ct_state_ttl          1,37         1,95          1,41          1,98           1,5           1,14
       ct_dst_ltm           5,75         4,14          2,62         15,45          1,35           1,19
   ct_src_dport_ltm        4,93           3,9          2,45         15,35            1            1,12
   ct_dst_sport_ltm         3,67         3,89           1,7         11,43            1              1
    ct_dst_src_ltm          7,46         4,21          3,25          3,92         1,19            1,03
      is_ftp_login         0,01            0           0,01             0            0              0
      ct_ftp_cmd            0,01           0           0,01             0            0              0
  ct_flw_http_mthd          0,13         0,02          0,03          0,01            0           0,62
       ct_src_ltm           6,47         5,93          3,42         15,73          1,68           1,98
       ct_srv_dst          9,17         10,11          6,92         23,05          1,59            1,5
       is_sm_ips            0,02           0             0              0            0              0
Table 2
Excerpt of the UNSW_NB15 training data set based on [12], calculating average values and highlighting
the highest values. Either average values for float attributes or the most occurred value for string
attributes (such as Protocol, Service, and State (with count in brackets)), sorted by attack category. Italic
values are not the highest, but still distinct in contrast to other attack categories.
possibility of multithreading, increasing the amount of agents based on the amount of incoming
traffic for scalability. The agents can easily be incorporated into multi-agent frameworks such
as the SEASALT architecture [1].
   We use case-based reasoning agents, each containing four knowledge containers according
to M. M. Richter [13]: vocabulary, similarity measure, adaptation knowledge, casebase.
   In terms of vocabulary structure, we use an attribute-value representation, as the measurable
data contains 35 attributes in addition to 12 derived attributes (see section 3.1). No set of
attributes contains unknown values, thus only complete situations are evaluated. Correlations
between certain attributes could not be detected, yet. Certainly, attributes contain correlation
to attack categories, which will be covered next in the similarity measure container.
   Following the weighted Hamming similarity measure

                            𝑠𝑖𝑚(𝑞, 𝑝) = ∑(𝑔𝑖 × 𝑠𝑖𝑚𝑖 (𝑞𝑖 , 𝑝𝑖 ) | 1 ≤ 𝑖 ≤ 𝑛)                        (1)

as Richter and others suggested, we utilize the local-global principle [14, 13, 15]. For local
measures 𝑠𝑖𝑚𝑖 , we inspect the attributes 𝐴𝑖 based on their minimum and maximum values and
calibrate a symmetrical polynomial function with heavily decreasing similarity for differing
attributes based on the variability of an attribute. The narrower the data points of an attribute,
the stronger decreasing the similarity function. For the amalgamation function, we set values
for the non-negative real weight vector coefficients 𝑔 = (𝑔1 , ..., 𝑔𝑛 ), normalized to ∑ 𝑔𝑖 = 1 [13].
   For the values of 𝑔, we calculate the average value of each attribute ranging over the whole
data set and also the average value filtered by each attack category. This enables us to identify
attributes which seem to hint at a certain attack for given values. For example, spkts (=‘source
packets’) depicts the source-to-destination packet count with the following calculated average
values for each attack category, reading: “For an exploit attack, 37,7 packets have been sent on
average from the source to the destination”:

    AVG     Analysis    Backd.     DoS    Exploit     ...   Normal    Recon   Shellcode    Worm
    18,67    3,12        4,39      28,9    37,7       ...    6,97      6,97     6,07       16,78
   While the average on the whole data set is at 18,67 and has lower values for other attacks,
Exploit points out with an average value of 37,7 packets from source to destination. This
confirms the intuitive expectation of an exploit attack: to exploit means in the IT-security
context to systematically abuse known security issues of a given system. However, it needs to
be tested, which security issues the target might have - resulting into multiple requests and
consequently an increased amount of packets running from source-to-destination. Therefore,
spkts receives a higher weight than other attributes for the exploit agent. The more distinct an
attribute-value, the higher the weight.
   On a similar notion, this situation also holds true for the denial of service (DoS) attack: with
a value of 28,9, it is also distinct enough from other attack categories, which range on average
between 2,8 and 16,78. Therefore, we are also able to identify attribute values, which are not
the maximum, but still unique to a certain attack category (highlighted italic in Table 2) - and
use this information to increase the weight of the given attribute for the corresponding agent.
   We repeat this process for each agent and each attribute.
   Each agent is trained to detect its respective attack, i. e., a DoS agent only contains cases
labeled with denial of service attacks. Thus, the case base contains the experiences based on the
training data set. We store each line of the data set as a case, resulting in 82332 cases. However,
there is still room left for improvement regarding the two conflicting goals: having the case
base as large as possible for increased competence knowledge while having the case base as
small as possible for better efficiency. We discuss this problem and the yet missing adaptation
knowledge further in section 6.
   For each package in the testing data set, each agent ‘votes’ by submitting its 𝑛 most similar
cases to a coordination agent. For now, it will be left open for discussion in section 5 whether
𝑛 should be 1 to submit only the most similar case or to calculate the average similarity of
𝑛 > 1 cases to reduce the risk of outliers. For our experiments, we choose 𝑛 = 10 to remove
outliers and gain insight whether the similarity of other similar cases is decreasing correctly, as
to be expected. The votes with the highest similarities will be reported to the (human) user.
After receiving the results, the user may decide which agent is ultimately correct - leaving the
responsibility and legal liability to the human user - and might choose to start further actions
to stop the attack, such as blocking the source IP address of the potential attacker.


4. Implementation
We implemented the system described in section 3 by using myCBR 3.4 and the programming
language JAVA. MyCBR is an open-source similarity-based retrieval tool and software develop-
ment kit (SDK)4 and has been further developed by students of the University of Hildesheim
and by the authors, hence the increased version number. MyCBR 3.4 and the whole prototype
described in this contribution are available under the LGPL licence at Github5 .
   Figure 1 provides a brief overview. In favor of simplicity in the presentation, we do not list
the argument parameters and return types of the used functions. First, the user will be provided
with a simple graphical web interface, asking to import either training- or testing data. Either
way, the class Stats will provide the knowledge engineer with average statistics (among other
functions) on the data set by visibly printing out the statistics to the IDE console or log file -
especially important for viewing the training data.
   In case of importing the training data set, new agents are initialized with the given training
data. As described in section 3, each initialized agent learns only cases of its corresponding attack
category. Each topic agent, such as analysis agent, backdoor agent, DoS agent, ... is extending
the abstract class Agent, which forces each agent to implement the methods initProject(),
initCaseBase() , addCases() , changeWeights() , startQuery() , print() . Initializing an
agent forces the agent to create a project, which begins to create the four knowledge containers,
especially defining the similarity measures. For String values, such as protocol, service and
state, the Levenshtein similarity function has been used. For all other (numeric) attributes, a
symmetrical polynomial function has been established (according to section 3.2). Afterwards,
the casebase will be initialized and allocated to the initialized project. Adding the cases (lines of
the training data) to the casebase and afterwards exporting the agent to the local storage disk

    4
        see http://mycbr-project.org/index.html
    5
        see https://github.com/jmschoenborn
Figure 1: Overview of the implementation, including 10 topic agents.


finalizes the initialization process of the agent. Using the exported files allows us to load the
agent for the testing data set. The agents can easily be adjusted to fit for other training data
sets as well.
   In case of importing the testing data set, already established agents are activated by the
coordination agent. Additionally, the user may provide a positive number 𝑎 for the minimum
number of different attack categories that should be considered and a positive number 𝑐 ≥ 𝑎
for the number of cases that should be presented. This allows the user to receive a broader
picture of the similarity distribution between multiple attack categories to prevent missing out
on ambiguous results. For each line (case) of the testing data set, the agents provide their 𝑛 best
matching cases and the coordination agent provides the best 𝑐 cases among 𝑎 attack categories
to the user. Figure 2 shows the end results after the voting phase by showing the 4 most similar
distinct attack categories along with the 10 best cases overall.


Figure 2: Example result after voting. The first four ranks are not the overall four best cases, but
instead the four best cases of four distinct attack categories. For the test case with ID 48024, labeled as
Fuzzers, the Fuzzers agent provides the best case with 93 % similarity (ID 44364). Additionally, 7 further
distinct cases of the Fuzzers agent casebase with at least 88 % similarity are provided. Thus, in terms of
a majority votum, the attack has been correctly identified.
5. Evaluation
5.1. Results
Table 3 presents our results of querying the topic agents with the testing data set. We provided
the first 1000 cases of an attack category and each activated topic agent voted with their 𝑛 = 10
most similar cases. Out of this pool of best cases (50 with 5 active agents), the 10 most similar
cases have been chosen. Each correct vote will be counted. If there are at least six correct
votes, the query is considered as correctly classified. All correct votes will be summarized
and a true-positive rate (𝑇 𝑃𝑅) in the last line provided. Consequently, 100 − 𝑇 𝑃𝑅 depicts the
false-positive rate (𝐹 𝑃𝑅).

              Majority Vote 𝑥        Backdoor Fuzzers Generic Shellcode Worm
                      0                 566          32         2            93       94
                      1                 254          33         1           150       33
                      2                 33           16          1           74        3
                      3                 12           14          1          111        0
                      4                 20           26          1          113        0
                      5                 24           18          0           88        0
                      6                 35           25          1           58        0
                      7                 33           40          3           75        0
                      8                 16           56          6           93        0
                      9                  7          142         21          81         0
                     10                  0          598        963          64         0
                    𝑇 𝑃𝑅               9,1 %       86,1 %    99,4 %       37,1 %     0%
                    𝐹 𝑃𝑅              90,9 %       13,9 %     0,6 %       62,9 %    100 %
                   Results of other approaches (no CBR) using the same dataset
             Wheelus et al. [16]         ranging from 69 % to 83 % TPR for all attacks
             Pratomo et al. [17]             69,21 % TPR on average for all attacks
            Mebawondu et al [18]                   76,96 % TPR for all attacks
Table 3
Results. Results are the absolute number 𝑥 of representative cases to the corresponding attack category.
Additionally, other approaches and their results to set our results into context.


5.2. Limitations
During the first test runs we encountered a few challenges with the data set, which resulted into
limitations to this prototype version. We are confident to lift these limitations in future work.
   1. Redundancy
      As to be expected, the training data set contained multiple redundant cases, containting
      the same attribute-value pairs. If these cases turn out to be the most similar case for
      a given testing data input query case, the majority vote will easily be flooded by the
      redundant cases. A relatively quick fix to this challenge would be to simply remove
      redundant cases and remain with one pivotal case. The occurrence of a large amount of
      redundant cases might contain context information, which should not easily be discarded.
      However, a more elegant and efficient way would be a proper introduction of case base
      maintenance under the aspect of pivotal cases, and coverage and reachability of cases in
      a casebase as introduced by Smyth & Keane [19].
      → For our tests, we focused on the Backdoor, Fuzzers, Generic, Shellcode, and Worm
      agents which per se do not contain redundant cases.

   2. Same case, different attack category
      We identified multiple cases with exactly the same attribute-value pairs, but different
      attack category labels (249-Analysis, 710-DoS, 1413-Reconnaissance, 1416-Exploits, 3421-
      Fuzzers). During the training phase, and tests within the training data set, this resulted
      into a 100 % similarity for a given case for multiple different attack categories, which is
      not a desirable result. However, during the testing phase, this problem actually did not
      seem to occur, thus no limitation here. However, for the sake of completeness, it should
      be mentioned.

   3. Resources and iterations
      Due to insufficient resources, the machine which executed the testing data set (CPU
      i7-4790K @ 4 GHz, 16GB RAM) could not provide enough resources to process all test
      cases, especially when reaching 1000+ cases. Therefore, we filtered the testing data
      set by attack category and moved through the first 1000 test cases per attack category.
      According to the trend of the results, this number is still representative for the remaining
      test data cases. The table below shows the attack category and the corresponding number
      of testing data cases:

                         Backdoor     Fuzzers   Generic    Shellcode    Worm
                           1746        18184     40000       1113        130

6. Discussion and future work
We presented a novel approach of using case-based reasoning for supporting intrusion detection
systems by using the UNSW_NB15 data set for training and testing purposes. We established a
multi-agent CBR system prototype with at least one topic agent for each type of attack category,
training these agents with the given training data set and modeling the similarity measures based
on the identification of distinct attribute-value pairs, characterizing given attack categories.
   Despite the limitations described in section 5.2, the results in Table 3 show very different
results. For Fuzzers and Generic, we receive very positive results by correctly identifying 86,1 %
and 99,4 % of the testing data - better, than other (non-CBR) approaches. However, other agents
do not achieve any competitive results, especially the Worm agent with 0 % correct majority
votes. This is most likely due to only containing 44 cases in the training data set whereas the
Generic agent contains 18871 cases. Yet, based on the data, Worms contain by far the most
distinct and characteristic values, which leaves us optimistic to receive better results after
further fine-tuning the local similarity measures. The backdoor agent also only contains 583
cases in its casebase and additionally contains only two distinct attributes. Other attributes can
wrongly be attributed to other attack categories, increasing the risk of false-positives. Despite
not actively evaluated here, the same problem arises for Analysis and Exploit: both attack
categories share very similar characteristics.
   However, there is still a lot of room for improvement as mentioned above. Introducing a proper
casebase maintenance system for removing redundant cases without reducing the competence
of the system by taking coverage and reachability into account, further adjustment of the local-
and global similarities, and learning further cases using additional datasets may improve the
overall performance of the system. Additionally, we have not integrated adaptation knowledge
yet, which still has to be identified but also provides a promising increase of performance. These
steps remain for future work and we hope to spark some interest in using CBR inside of the IT
security domain.


References
 [1] K. Bach, Knowledge Acquisition for Case-Based Reasoning Systems, Ph.D. thesis, Univer-
     sity of Hildesheim, 2013. URL: http://www.dr.hut-verlag.de/978-3-8439-1357-7.html.
 [2] I. T. Union, Measuring digital development facts and figures 2021, ITUPublications, Geneva
     (2021). URL: https://www.itu.int/en/ITUD/Statistics/Documents/facts/FactsFigures2021.
     pdf, last validation: 04/30/2022.
 [3] S. Kapetanakis, A. Filippoupolitis, G. Loukas, T. Saad Al Murayziq, Profiling cyber attacks
     using case-based reasoning, in: 19th UK Workshop on Case-Based Reasoning, 2014, pp.
     39–48.
 [4] M. L. Han, H. C. Han, A. R. Kang, B. I. Kwak, A. Mohaisen, H. K. Kim, Whap: Web-hacking
     profiling using case-based reasoning, in: 2016 IEEE Conference on Communications and
     Network Security (CNS), 2016, pp. 344–345.
 [5] H. M. Mohssine El Ajjouri, Siham Benhadou, Case retrieval implementation for intrusion
     detection architecture based on multi agent systems and case based reasoning technique, in:
     International Journal of Scientific & Engineering Research, volume 10, 2019, pp. 1184–1189.
     ISSN 2229-5518.
 [6] R. F. Erbacher, S. E. Hutchinson, Extending case-based reasoning to network alert reporting,
     in: 2012 International Conference on Cyber Security, 2012, pp. 187–194.
 [7] Y. Wang, A. Zhu, J. Zhang, A case-based reasoning method for network security situa-
     tion analysis, in: 2011 International Conference on Control, Automation and Systems
     Engineering (CASE), 2011, pp. 1–4.
 [8] J. Long, D. Schwartz, S. Stoecklin, Application of case-based reasoning to multi-sensor
     network intrusion detection, in: Proceedings of the 4th WSEAS International Conference
     on Computational Intelligence, Man-Machine Systems and Cybernetics, CIMMACS’05,
     World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin,
     USA, 2005, p. 260–269.
 [9] H. Y. Abutair, A. Belghith, Using case-based reasoning for phishing detection, Procedia
     Computer Science 109 (2017) 281–288. 8th International Conference on Ambient Sys-
     tems, Networks and Technologies, ANT-2017 and the 7th International Conference on
     Sustainable Energy Information Technology, SEIT 2017, 16-19 May 2017, Madeira, Portugal.
[10] M. Lansley, N. Polatidis, S. Kapetanakis, K. Amin, G. Samakovitis, M. Petridis, Seen the
     villains: Detecting social engineering attacks using case-based reasoning and deep learning,
     in: S. Kapetanakis, H. Borck (Eds.), Workshops Proceedings for the Twenty-seventh
     International Conference on Case-Based Reasoning co-located with the Twenty-seventh
     International Conference on Case-Based Reasoning (ICCBR 2019), Otzenhausen, Germany,
     September 8-12, 2019, volume 2567 of CEUR Workshop Proceedings, CEUR-WS.org, 2019,
     pp. 39–48. URL: http://ceur-ws.org/Vol-2567/paper4.pdf.
[11] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, A. Hotho, A survey of network-based
     intrusion detection data sets, Computers & Security 86 (2019) 147–167.
[12] N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network intrusion
     detection systems (UNSW-NB15 network data set), in: 2015 Military Communications and
     Information Systems Conference (MilCIS), 2015, pp. 1–6.
[13] M. M. Richter, The knowledge contained in similarity measures, Invited Talk at the First
     International Conference on Case-Based Reasoning, ICCBR’95, Sesimbra, Portugal, 1995.
[14] R. Bergmann, Experience Management: Foundations, Development Methodology, and
     Internet-Based Applications, volume 2432 of Lecture Notes in Computer Science, Springer,
     2002. URL: https://doi.org/10.1007/3-540-45759-3.
[15] S. Wess, Fallbasiertes Problemlösen in wissensbasierten Systemen zur Entscheidungsunter-
     stützung und Diagnostik: Grundlagen, Systeme und Anwendungen (translated: Case-based
     problem solving in knowledge-based systems for decision support and diagnostic: basics,
     systems and applications), Ph.D. thesis, University of Kaiserslautern, 1995. Infix-Verlag.
[16] C. Wheelus, E. Bou-Harb, X. Zhu, Tackling class imbalance in cyber security datasets, in:
     2018 IEEE International Conference on Information Reuse and Integration (IRI), 2018, pp.
     229–232.
[17] B. A. Pratomo, P. Burnap, G. Theodorakopoulos, Unsupervised approach for detecting low
     rate attacks on network traffic with autoencoder, in: 2018 International Conference on
     Cyber Security and Protection of Digital Services (Cyber Security), 2018, pp. 1–8.
[18] J. O. Mebawondu, O. D. Alowolodu, J. O. Mebawondu, A. O. Adetunmbi, Network intrusion
     detection system using supervised learning paradigm, Scientific African 9 (2020) e00497.
[19] B. Smyth, M. T. Keane, Remembering to forget: A competence-preserving case dele-
     tion policy for case-based reasoning systems, in: Proceedings of the Fourteenth
     International Joint Conference on Artificial Intelligence, IJCAI 95, Montréal Québec,
     Canada, August 20-25 1995, 2 Volumes, Morgan Kaufmann, 1995, pp. 377–383. URL:
     http://ijcai.org/Proceedings/95-1/Papers/050.pdf.

</pre>