<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Domain Name System (DNS) Tunnelling Detection using Structured Occurrence Nets (SONs)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computer Science and Engineering, University of Hai'l</institution>
          ,
          <addr-line>Hai'l</addr-line>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing, Newcastle University</institution>
          ,
          <addr-line>Newcastle upon Tyne</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>93</fpage>
      <lpage>108</lpage>
      <abstract>
        <p>Today, serious warnings regarding the increasing number of DNS tunnelling methods are on the rise. Attackers have used such techniques to steal data from millions of accounts. The existing literature has thoroughly demonstrated the extent of the damage which DNS tunnelling can achieve on any given DNS server. However, through SONs - Petri net-based formalisms which portray the behaviour of complex evolving systems, such threats can be alleviated. As a concept, SONs are originally grounded in Occurrence Nets (ONs) and already yielded results in terms of successful cybercrime analysis. For instance, adding of alternates to SONs initially used in [10] was extended to in [15] in order to model and analyse system activities such as cybercrime or accidents, which may show contradictory or uncertain evidence in terms of actual activity. The current paper proposes the use of SON features with the purpose of detecting DNS tunnelling, in the event of an actual attack.</p>
      </abstract>
      <kwd-group>
        <kwd>DNS tunnelling</kwd>
        <kwd>Structured Occurrence Nets</kwd>
        <kwd>Detection of DNS Attacks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the last decades, internet usage and spread has grown dramatically,
expanding to include everything and anything, from small, online businesses, to large
company websites. As a consequence, what was initially “the Web” became “Web
2.0”, considering the number of applications which has been developed, based on
the internet. It is the Domain Name System or DNS, which allows applications
to use names instead of numbers (IP addresses), which are considerably more
difficult to deal with. There are several works [
        <xref ref-type="bibr" rid="ref12 ref4 ref5 ref9">4,5,9,12</xref>
        ] warning of the increasing
number of DNS tunnelling methods, and attackers have used these techniques
to compromise millions of accounts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The existing literature has demonstrated
the damage which DNS tunnelling can make on any deployed DNS server. DNS
tunnelling allows hackers to transfer data in a way which violates established
system security policies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The danger which such an attack brings is that it
can occur without triggering any alarms. It is actually shown as a legitimate
activity, because it uses a DNS protocol to transmit information in the
original way, instead of focusing on DNS vulnerabilities or abusing or exploiting
the system. The reason behind using SONs to record DNS traffic are the
different features that SONs provide. For example, a SON combines multiple related
ONS by using various formal relationships in particular, for communication. For
instance, SON events can be used to model various types of packets, such as
sending a query, receiving a query, sending a response and receiving a response,
as well as asynchronous communication. By means of these features, a SON is
able to depict and analyse a DNS case. SONs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are a Petri net-based formalism
which portray the behaviour of complex evolving systems. Their concept is
originally grounded in Occurrence Nets (ONs) and already yielded results in terms
of successful cybercrime analysis. The approach which we propose is to use SON
features to detect DNS tunnelling in the event of a real attack.
      </p>
      <p>
        DNS Tunnelling DNS tunnelling is a covert communication channel, which
allows encapsulating the traffic of other protocols (E.g.: HTTP, Telnet, FTP, SSH,
etc.) within DNS packets. This results in concealing the real protocol by making
the traffic look like ordinary DNS traffic. DNS tunnelling is very suitable to be
used for malicious activities such as data exfiltration as well as command and
control call-backs from within restricted internal networks. Different from other
tunnelling methods such as SSH Tunnelling, a distinctive property of DNS
Tunnelling is that, it uses the legitimate DNS servers of the internal network to hop
its packet through, to reach the final destination, which is the attacker-controlled
server DNS Tunnelling encapsulates the data in DNS queries and responses. In
this way, the outer protocol remains as DNS; however, the encapsulated data
payload belongs to another protocol. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Figure 3 explains how DNS tunnelling
is performed. The powerful advantage of DNS tunnelling is that, if a
compromised computer is trying to send “secret” data to the attacker-controlled server
on the external network (the Internet), even if such direct communications get
blocked by the network firewall, DNS Tunnelling would bypass this restriction.
The compromised computer would send the secret data within a Base64-encoded
message, for example “c2VJcmV0”. It will send the query to the internal DNS
server, to ask the corresponding IP address, in the form of a basic DNS query
for c2VJcmV0.talalncl.com. The internal DNS server would forward this DNS
query to the authoritative DNS server for the DNS Zone “talalncl.com”. Since
the authoritative DNS server is actually the attacker-controlled external server,
the message gets delivered to its destination. The talalncl.com DNS server will
then respond and return a dummy answer, such as 127.0.0.1. Afterwards, the
talalncl.com DNS server will analyse the query as follows:
Step 1: c2VjcmV0.talalncl.com,
Step 2: Base64Decode,
Step 3: c2VjcmV0 -&gt; secret
which means the data extracted from within the DNS query. Finally, a single
message within DNS tunnelling looks normal. This message may look exactly
like a normal DNS query. This is actually the weakness of DNS Tunnelling:
although a single packet may look like a normal DNS query, thousands of queries
for sub-domains within the same domain will help to detect the DNS tunnel.
Within the next section (section 5), the novel approach which we propose is
discussed (i.e., a novel way through which such types of attacks are detected).
In addition, an algorithm and its implementation are also discussed. The paper
is organised as follows: Section 1 focuses on introduction Section 2 describes
the basic features of SONs, as well as background research concerning DNS and
the DNS tunnelling phenomenon. Within Section 3 the related work discussed.
Section 4 describes DNS experimentation and data preprocessing. Within Section
5 describes the novel solution used to detect DNS tunnelling. Section 6 focuses
on the implementation of the solution, as well as on the results and a critical
assessment of our algorithm. Section 7 provides concluding remarks.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Terminologies</title>
      <sec id="sec-2-1">
        <title>Background</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Structured occurrence nets (SONs).</title>
      <p>
        The SONs [
        <xref ref-type="bibr" rid="ref11 ref6">6,11</xref>
        ] concept is originally grounded in occurrence nets (ONs), which
are directed acyclic graphs which show causality and concurrency of information
concerning a single execution of a system [
        <xref ref-type="bibr" rid="ref11 ref6">6,11</xref>
        ]. Figure 2 (a) shows an occurrence
net (ON). A SON consists of multiple ONs which are associated with each other
through various types of formal relationships. SONs are used for recording
information concerning the actual behaviour of a complex system, and any particular
evidence which can be collected in terms of analysing its past behaviour [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
most useful way to use SONs is within a detailed analytical investigation,
although there is still a lack of investigation support systems which rely on SONs.
The significance of SONs results from the fact that their structuring reduces
complexity compared to that of any equivalent representation, and they provide
a direct means of modelling evolving systems.
      </p>
      <p>
        Communication in SONs is of two types: asynchronous and synchronous [
        <xref ref-type="bibr" rid="ref11 ref8">8, 11</xref>
        ].
Figure 2 (b) shows a communication structured occurrence net (C-SON) which
consists of two occurrence nets, namely ON 1 and ON 2. Figure 2 (b) depicts
asynchronous communication which is represented through the dashed arc
between two events in different ONs, e.g., (e0) and (e2). The second type of
communication within SONs is synchronous communication which is represented
by the two arcs between events (e1, e3) via the two arcs via channel place (q1
and q2) [
        <xref ref-type="bibr" rid="ref11 ref8">8, 11</xref>
        ]. Thus, in any execution consistent with the causal relationships
captured by (C-SON), (e0) will never be executed after (e2), although the two
events can be executed simultaneously, whereas (e1) and (e3) will always be
executed simultaneously. SONs are underpinned by causal structures which
extend causal partial orders with additional ordering, called weak causality. Two
events ordered in this way can be executed in the order given or simultaneously.
Moreover, if two events are weakly ordered in both directions(this means, in
particular, that weak causality is not assumed to be acyclic), then they can only
be executed simultaneously. In SONs, weak causality results from passing tokens
through channel places, whereas the non-channel places introduce the standard
causality, as in ONs. In Figure 2 (b), (e0) weakly causes (e2), and the events
(e1) and (e3) form a weak causality cycle.
2.3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Domain Name System (DNS)</title>
      <p>
        A DNS or Domain Name System assigns domain names and then maps each of
them in accordance with the existing name servers, which host each respective
domain [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. At the same time, due to various reasons, network administrators
are allowed to delegate authority over some sub-domains of a given name space
to other name servers. This permission was given so that the existence of a
single, large, centralised database would not occur; at the same time, the resulting
services are fault-tolerant [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Additionally, DNS also defines the technical
parameters of each database service which is situated at its core. These parameters
are part of what is known as the DNS protocol, meaning a detailed specification
of both the data structures within the DNS, as well as communication between
these various structures. Both these elements make up the Internet Protocol
Suite. As a broad rule, two main name spaces are maintained across the
internet, namely name hierarchy [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and IP or Internet Protocol address spaces. DNS
accomplishes two tasks: on one hand, it ensures the domain name hierarchy is
kept, while providing translation services between it and a given address space,
on the other hand. To function properly, DNS relies on multiple servers as well
as on a communication protocol. Records for a specific domain are stored on a
particular type of DNS server called a DNS name server; it answers to queries
addressed to databases within it. Once a user types in any web address in the
browser, for instance www.google.com, the current, local DNS server (resolver)
will check whether or not the address is stored within it and then sends a
response to the user, with the IP address of that particular website. However, if
the resolver does not recognise that domain, it will send the query to the DNS
root name server. As a result, the DNS root server will send a response back
to the resolver, telling it the domain name (google.com) belongs to name server
for .com LTD. Afterwards, the resolver will send the query to the DNS server
google.com, asking about the IP address of www.google.com. As a result, the
.com name server will respond with an IP address for google and the resolver
will receive it and open the website for the user [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ]. Figure 3 illustrates this
process.
      </p>
      <p>
        Name servers The maintenance of Domain Name Systems is ensured through
distributed database systems, which use client - server models. In this cases,
database nodes consist of name servers. At the same time, for each existing
domain, one or more given DNS servers provide data concerning the respective
domain and the servers of domains which are subsequent to that respective
domain. At the top of this hierarchy, we find root name servers, which typically
receive queries when a TLD must be solved [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ].
      </p>
      <p>
        Location transparency Location transparency helps to identify existing
network resources, through names, rather than through location [
        <xref ref-type="bibr" rid="ref13 ref6">6,13</xref>
        ]. For instance,
a user gains access through a unique file; however, actual data is reserved in
sectors which are dispersed either across an network or within a local computer.
Within such a system, the actual location of a file is not relevant to the user;
however, a distributed system is required, so as to ensure a naming scheme for
the available resources. One of the essential gains of using location transparency
is that location is indeed irrelevant. Considering the given network setup, a user
can obtain files from any computer connected to the network. As such, the
actual location of a given resource does not matter anymore, thus creating the
impression that the whole ensemble is accessible from any computer terminal,
further boosting software upgrades [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Using location transparency also
provides considerable versatility. This means system resources can be shifted from
one computer to another, within the network, without disrupting the network.
DNS records The DNS protocol typically uses multiple record types for various
purposes. The most common type of such records is A, which essentially switches
domain names into IP addresses. For example, AAAA records accomplish a
similar role for IPv6 protocols. Other record types, such as CNAME, are known
as aliases. This type of record directs towards another defined domain name,
but never towards an IP address. At the same time, one IP address can have
several domain names or aliases. NS stands for NameServer; it keeps the address
of the authoritative NameServer which can resolve any sub-domains for that
respective domain. At the same time, TXT records link any given text with a
specified domain name, while MX records provide information regarding mail
servers.
3
      </p>
      <sec id="sec-4-1">
        <title>Related work</title>
        <p>The review of the existing literature concerning DNS tunnelling led us to
conclude that there are two main types of analyses when it comes to this
phenomenon: traffic analysis and payload analysis. The focus of our research
pertains to the former.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Traffic Analysis</title>
      <p>
        DNS traffic volume per IP address Traffic analysis detects attacks by
monitoring traffic generated by specific IP addresses (Pietraszek, 2004). The idea is
simple: each tunnelled data request takes up to 512 bytes, a complete
communication will imply sending many requests (Van Horenbeeck, 2006). As such, a
server will continuously send the request, provided the client polls the server [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
DNS traffic volume per domain The other fundamental method for
detecting DNS tunnelling involves analysing the amount of traffic to a particular
domain. Since utilities are designed to tunnel data through specific domain names,
this method can assess the possibility of DNS tunnelling (Butler, 2011), unless
the attack is configured through multiple domain names [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Number of Hostnames per Domain Guy (2009) suggests the number of
hostnames per domain may indicate the possibility of DNS tunnelling. In such
an event, each request has a unique hostname, meaning there are far more
hostnames as compared to a legitimate domain name. This method can be tailored
to determine an optimal threshold, since various tunnelling methods use various
numbers of hostnames [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, our approach is to check each domain and
sub-domain, thus identifying the optimal threshold, both through logistic
recognition, as well as through a powerful SON feature which optimises causality and
communication features.
      </p>
      <p>Domain history Domain history can help detect DNS tunnelling, as it raises
suspicions about DNS traffic. It is used to detect tunnelling by determining
when A or NS records were added, since a given domain may have recently
been acquired for DNS tunnelling purposes, meaning its NS was recently added
(Zrdnja, 2007). This method is used to detect domains involved in malicious
activities.</p>
      <p>
        Visualisation Visualisation can also be used to detect DNS tunnelling (Guy,
2009). It involves interactive, analyst driven work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. By contrast, our
SONbased visualisation approach illustrates results automatically [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
4.1
      </p>
      <sec id="sec-5-1">
        <title>Preprocessing Datasets</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Data collection</title>
      <p>Data collection constitutes the initial stage of our work. The first step consists of
discovering how DNS packets work, how they behave and what their structure is.
This step actually consists of two stages which have been conducted in parallel:
first, the extensive studying of literature pertaining to the subject and second,
capturing real data from a local DNS server. Initially, we captured normal DNS
packets. Afterwards, we modelled the normal packet in our model SON. Table 2
illustrates the DNS packet and Figure 4 shows the result. This type of data can
be modelled in SONs; however, the “Info” field has various information which
must actually be split into different fields. The reason behind this split is to
distinguish whether the data packet is either query or response. Additionally,
this split serves to distinguish the domain names, but also to determine the
packet’s id (0x81fe), since it is this ID which is our indicator, namely the one
which links each packet (query) with its response. For instance, row 26 is the
query and it is linked with row 29, which is the response for the ID packet in
question (0x81fe).</p>
      <p>
        We developed a python script [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to deal with this particular scenario. The
main idea of this script is to split the “Info” field into 3 subfields, namely
operation (which is either query or response), “PacketID” and finally, “Domain”, as
shown in the table 3. Afterwards, we modelled the last result of a normal packet,
to ensure how it will be represented within the SON, as shown in Figure 4.
      </p>
      <p>Moreover, during the experiments, we collected attack packets, the result
of DNS tunnelling attempts. This was, in fact, the second challenge which we
discovered and treated. Table 4 represents one such attack packet. As noticeable,
if we attempt to model this packet, it will look the same as the normal packet,
described above. Due to this, the point of the research is to discover an idea
or algorithm which allows the clear distinguishing between normal packets and
attack packets. The idea is one must examine each particular domain and its
sub-domains, with the purpose of discovering whether or not a specific packet is
an attack packet or not.</p>
      <p>As such, the python script was updated to split the fields with which we are
dealing with, in this case. The main idea for the updated script version is that it
loops on each domain field and links it by a unique ID. It is called GroupID. As a
result, afterwards, we can identify each chunk of the domain and its sub-domains
by that GroupID, as Table 5 illustrates.</p>
      <p>Table 5 illustrates the link between the “Google” domain and its subdomains.
For instance, we assume “google.com” and “maps.google.com” have the same
group id, namely “001”. However, subdomains (paaac5ay) and (paaydani) have
been grouped to ”.us.to” via group id, which is actually “002”. So, the idea is
to initially identify the domain, i.e. “google” and loop to find any particular
sub-domain of “google” and link it with a unique group id.
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Mixing data</title>
      <p>
        In order to simulate a real scenario of DNS server behaviour, the collected data
consisted of two main parts. The first part included the data collected during
the attack experiment. The second part includes normal data, collected from the
university. For this data to make sense, and in order to simulate a real scenario
example, we created python script [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] mixing these two kinds of data. This data
mix will be evaluated in different stages in the evaluation section.
5
      </p>
      <sec id="sec-7-1">
        <title>Proposed Solutions</title>
        <p>5.1</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Detection DNS Tunnelling using (SONs)</title>
      <p>The main idea of our algorithm to detect DNS tunnelling is to model DNS data
i.e., each particular transaction and the way it will communicate with the local
DNS server. We assume that each packet is a unique ON. And the local DNS
server is one ON. Each packet (ON) communicates with the local DNS server
by channel place (thick circle) via events (Send query and receive query),
as shown in figure 4 . The client (ONc) sends a query to local DNS server
(ONserver), and then DNS server responds to the client as shown in Figure 4.
This is the normal packet. When we were to model the attack packet, we found
it is the same as the normal one in terms of communication and behaviour. The
reason is that we examined each packet as a separate one. However, to detect
an attack we need to examine each domain and its sub-domains in order to
examine wither we see normal or abnormal packets. For instance, if google.com,
map.google.com and developers.google.com are involved, then we need to deal
with all google domains and its sub-domains, and model them as one of chunk of
domain packets. So, we group each domain and its sub-domains with unique ID
as in prepossessing section above. The result is seen in Figure 5, where we can
distinguish between normal and abnormal packets (we displayed the attack ON
very small to fit the width of the text; however, the ON transaction is actually
bigger).</p>
      <p>Fig. 5: Normal and Abnormal ONs
5.2</p>
    </sec>
    <sec id="sec-9">
      <title>Detection DNS Tunnelling algorithm</title>
      <p>The main idea behind the algorithm is first to detect any ON input, whether it
is a normal input or an ON representing a DNS attack. Before going more in
depth, it should be mentioned that ONs refer to DNS packets. Furthermore, we
assume the local DNS local server to be another ON. It does not matter whether
inputs are normal or representing a DNS attack. We examine each ON input and
count its events which communicate with the DNS Server ON. If the number
of those ON events is less than threshold value, we flag it as a normal ON.
Otherwise, we flag it as an abnormal ON. Then, we check whether or not we had
abnormal ONs; if so, then we know a DNS attack has occurred. Finally, we should
mention that the aforementioned threshold was identified via logistic regression,
a widely used statistics model. The idea behind this binomial model resides in
properly estimating the parameters of the logistics model. From a mathematical
standpoint, the model relies on variables with two possible values, for instance
“win” or “lose” or “pass” or “fail”. As such, we applied logistic recognition to
available data in order to estimate the threshold, and estimate how many events
within a particular GroupID could be suspicious.</p>
      <p>The calculated threshold It is at the core of the logistic regression model.
From a mathematical standpoint, logistic regression performs multiple linear
regression functions of the required feature, such as the number of events. These
functions are based on the dependent, dichotomous variable, i.e. “attack” or “no
attack”. The aforementioned functions produce a threshold by using the logit
equation. Finally, this threshold can be used in our algorithm. The
mathematical aim of linear regression functions is to build an equation with the following
coefficients: (a) and (c), as follows: y = ax + c, where (y) represents the
probability of an attack and (x) is the number of events. Afterwards, the values for both
the number of events and the dependent variable, namely “attack” or “not
attack”, will be inserted; this will allow us to perform regression analysis, meaning
we will be able to calculate the values of (a) and (c), which in turn will increase
the likelihood of (y) conforming to ground truth values. Once these values are
known, they can be used to calculate the threshold.
18
19
20
21
22
23
24
25
29
30 end</p>
      <p>end
Algorithm 1: Detection DNS Tunnelling algorithm using SON (Part1)
for (Event current_event in current_on’s events) do
if (current_event is connected to dns_server_on) then</p>
      <p>connected_events_count = connected_events_count +1;
end
end
if (connected_events_count &gt;= attack_max_events) then
set_of_abnormal_ons[set_of_abnormal_ons_index] =
current_on; set_of_abnormal_ons_index =
set_of_abnormal_ons_index + 1;
else
end
set_of_normal_ons[set_of_normal_ons_index] = current_on;</p>
      <p>set_of_normal_ons_index = set_of_normal_ons_index + 1;</p>
      <p>Algorithm 1: Detection DNS Tunnelling algorithm using SON (Part2)</p>
      <sec id="sec-9-1">
        <title>Result Testing and Evaluation</title>
        <p>We considered the following proportions of attacking packets (as percentage of
the total packets):</p>
        <p>
          0%, 1%, 5%, 10%, 20%, 40%, 60%, 80%, 90%, 95%, 99% and 100%.
In addition, we used the sensitivity and specificity methods [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], developed to
evaluate a system of computer-assisted diagnosis. Hence we used a tried and
tested method, with the only change being made to the terminology, as follows:
        </p>
        <p>
          First, True positive (TP) result: attack transactions were correctly
identified as attacks.Then, False positive (FP) result: normal transactions were
incorrectly identified attacks.And, True negative result (TN) : normal transactions
were identified as normal transactions.Finally, False negative (FN) result: attack
transactions were incorrectly identified as normal transactions. In terms of the
data mix (of normal and attack packets), we created a Python script [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The
idea behind the script was to mix the two data sets, namely: the normal and
attack packets. Each time we ran the script, we asked the script to mix the data,
relying on the percentage from each file, such as 10% normal and 90% attack;
as a result, we obtained one file with mixed data. The main data sets for each
packet type, i.e., normal or attack packets, were collected during a five-minute
time span. The data for both sets consisted of roughly 700 DNS packets.
After running the algorithm, we checked the sensitivity and specificity through an
evaluation method, obtaining the following results. As evident in (1) in Table
6, in True Positives and False Negatives, we assumed that there was no result
because this data set had no attack packets.
        </p>
        <p>However, in (2) in Table 6, when the data set had 1% attack and 99% normal
packets, we got a 0% True Positive, which means there were no attacked packets
detected as attack packets, although we had 1% attack data. This is because
the number of packets in this data set was less than the threshold. Regarding
False Negatives, we also got 0% because there were no attack packets detected as
normal packets. Likewise, in (3) in Table 6, the same situation applied. However,
in (4) in Table 6, the algorithm detected 7.1% from a total of 10% of the attack
data; however, it failed to detect 2.9% of the attack packets. After we checked
the data sets manually, we found that most of these packets were only request
queries to the DNS server, to which the DNS server did not respond, so our
algorithm skipped them. However, from this 2.9%, the algorithm did not attacked
them, although they have query and response packets. In future research, we will
further investigate this issue in order to discover why this happened. Another
interesting point is that, in (4) in Table 6, there is a False Positive, which means
the algorithm detected 9.4% of the normal packets as attack packets; after
investigating this, we found that Google had sent many requests to the DNS server
as unusual requests, and these request events were above our threshold. When
we rerun the experiment, we did not face this issue. Moreover, as evident in (7)
in Table 6, no normal packets were detected as attack packets (False Positives).
Finally, in (15) in Table 6, the False Positives and True Negatives were “N/A”;
this was because, in these data sets, there were no normal packets at all.
The paper proposes a novel solution to DNS tunnelling detection based on SONs.
An detection algorithm has been designed and implemented. Also, data
preprocessing and a set of experiments have been discussed. Further work will focus
on improving the current algorithm, aiming at allowing it to work
automatically in terms of reading with packets in real-time scenarios. In addition, we will
model and develop algorithms dealing with more than one DNS server at a time,
including all user packets. In addition, we will use other DNS tunnelling tools,
instead of “iodine”, in order to compare various attack behaviours and check how
the algorithm deals with these types of attacks.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Acknowledgements</title>
        <p>The authors would like to thank the reviewers for their comments on the paper.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alharbi</surname>
          </string-name>
          , T.:
          <article-title>mix packets script</article-title>
          . https://github.com/talalsm/mix/ (May
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alharbi</surname>
          </string-name>
          , T.:
          <article-title>split packets script</article-title>
          . https://github.com/talalsm/mix/ (May
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alharbi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutny</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Visualising data sets in structured occurrence nets</article-title>
          .
          <source>PNSE</source>
          <year>2018</year>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Born</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gustafson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Detecting dns tunnels using character frequency analysis</article-title>
          .
          <source>arXiv preprint arXiv:1004.4358</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Farnham</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atlasis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Detecting dns tunneling</article-title>
          .
          <source>SANS Institute InfoSec Reading Room</source>
          <volume>9</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sit</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balakrishnan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>Dns performance and the effectiveness of caching</article-title>
          .
          <source>IEEE/ACM Transactions on networking 10(5)</source>
          ,
          <fpage>589</fpage>
          -
          <lpage>603</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kapp</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Connolly</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakel</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meza</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fenyo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eng</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adkins</surname>
            ,
            <given-names>J.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Omenn</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          , et al.:
          <article-title>An evaluation, comparison, and accurate benchmarking of several publicly available ms/ms search algorithms: sensitivity and specificity analysis</article-title>
          .
          <source>Proteomics</source>
          <volume>5</volume>
          (
          <issue>13</issue>
          ),
          <fpage>3475</fpage>
          -
          <lpage>3490</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Koutny</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Randell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Structured occurrence nets: A formalism for aiding system failure prevention and analysis techniques</article-title>
          .
          <source>Fundamenta Informaticae</source>
          <volume>97</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>41</fpage>
          -
          <lpage>91</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. van Leijenhorst,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Chin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.W.</given-names>
            ,
            <surname>Lowe</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>On the viability and performance of dns tunneling</article-title>
          .
          <source>International Conference on Information Technology and Applications</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Randell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alharbi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutny</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Soncraft: A tool for construction, simulation, and analysis of structured occurrence nets</article-title>
          .
          <source>In: 2018 18th International Conference on Application of Concurrency to System Design (ACSD)</source>
          . pp.
          <fpage>70</fpage>
          -
          <lpage>74</lpage>
          . IEEE (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Randell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alharbi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutny</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Soncraft: A tool for construction, simulation, and analysis of structured occurrence nets</article-title>
          .
          <source>In: 2018 18th International Conference on Application of Concurrency to System Design (ACSD)</source>
          . pp.
          <fpage>70</fpage>
          -
          <lpage>74</lpage>
          . IEEE (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Merlo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papaleo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veneziano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aiello</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A comparative performance evaluation of dns tunneling tools</article-title>
          .
          <source>In: Computational Intelligence in Security for Information Systems</source>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>91</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Paul</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunlap</surname>
            ,
            <given-names>K.J.</given-names>
          </string-name>
          :
          <article-title>Development of the domain name system</article-title>
          .
          <source>ACM</source>
          <volume>18</volume>
          (
          <issue>4</issue>
          ) (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <surname>P.:</surname>
          </string-name>
          <article-title>A bigram based real time dns tunnel detection approach</article-title>
          .
          <source>Procedia Computer Science</source>
          <volume>17</volume>
          ,
          <fpage>852</fpage>
          -
          <lpage>860</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Randell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Occurrence nets then and now: The path to structured occurrence nets</article-title>
          . In: Kristensen,
          <string-name>
            <given-names>L.M.</given-names>
            ,
            <surname>Petrucci</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>Applications and Theory of Petri Nets</article-title>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>