<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using ILP to Analyse Ransomware Attacks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oliver Ray</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Hicks</string-name>
          <email>sam.hicks.2014@my.bristol.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steve Moyle</string-name>
          <email>steve@amplifyintelligence.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Amplify Intelligence Ltd</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Bristol</institution>
        </aff>
      </contrib-group>
      <fpage>54</fpage>
      <lpage>59</lpage>
      <abstract>
        <p>This paper describes a preliminary study aimed at using the ILP system ALEPH to interactively assist human experts in learning rules to better understand the behaviour of cyberattacks. We develop an ILP formalism for representing network log data obtained from a sandbox computer that was deliberately infected with the CryptoWall-4 malware (a state-of-the-art ransomware attack known to be causing signi cant global disruption at the time of writing) and we show how ALEPH can be used to interactively learn simple rules comparable to those hand-crafted by a human expert. In so doing, we also identify some limitations of the mechanisms ALEPH currently provides to support incremental learning and we motivate some promising directions of future work.</p>
      </abstract>
      <kwd-group>
        <kwd>Cyberattack</kwd>
        <kwd>CryptoWall</kwd>
        <kwd>Incremental ILP</kwd>
        <kwd>ALEPH</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Defending computer systems from attack is a daunting task for even the most
expert humans. Cybercrime is increasingly industrialised, with skilled software
teams building programs designed to exploit abundant weaknesses in computer
security. Defenders must try to prevent all attacks while attackers only need to
get lucky once. The volume and skewed nature of available data are also against
the defenders: since the vast majority of computer interactions are not attacks,
traditional machine learning methods tend to produce overly-general models that
essentially classify all interactions as benign. Furthermore, the asymmetries in
the costs of making errors mean that machine-learned models which predict too
many false alarms (false-positives) rapidly lose the trust of the defenders; while
those that miss genuine attacks (false-negatives) are e ectively useless.</p>
      <p>Malware defences come in two main forms: endpoint security systems that
run on the computer being defended and make detailed local observations about
individual user behaviour (e.g. operating system calls); and network security
systems that eavesdrop on the network interactions between computer systems
and observe the communications between multiple endpoints simultaneously.
The latter work by logging various aspects of network tra c using sni er devices
called N etwork I ntrusion D etection S ystems (NIDS) { and our goal is to show
how such logs may be used by ILP to help analyse ransomware attacks.</p>
      <p>The purpose of our work is to use ILP to help human defenders in
understanding the operation of cyber-attacks from NIDS log data. Our approach is
motivated by two observations. First, we believe network security systems will
become increasingly important in the ght against cybercrime as NIDS can be
transparently inserted into and used to defend entire computer networks. And,
since nearly all malware is currently transferred via the network (e.g. as email
attachments or `accidental' web downloads), it should leave a trace in the logs.
Second, we believe ILP is well-suited to this task because it can potentially
exploit other background knowledge known to defenders, including a plethora of
syndicated threat intelligence information (e.g. known malware domains),
human expertise, and support communities (e.g. online forums).</p>
      <p>In contrast to previous work, our aim is to help a defender analyse a newly
discovered attack, as opposed to automatically inducing detection signatures for
previously known attacks. We do not seek to replace the defender with a machine;
but we wish to have them join forces in a way that ampli es their power. To this
end, the rest of this paper is organised as follows. Section 2 introduces the notion
of ransomware and the signi cant CryptoWall-4 example. Section 3 outlines our
ongoing experiments to induce an understanding of the workings of ransomware
from network logs. Finally, our preliminary ndings and plans for future work
are described in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Ransomware and CryptoWall</title>
      <p>
        Ransomware, has been de ned as \. . . a cryptovirology attack carried out using
covertly installed malware that encrypts the victim's les and then requests
a ransom payment in return for the decryption key that is needed to recover
the encrypted les. Thus, ransomware is an access-denial type of attack that
prevents legitimate users from accessing les since it is intractable to decrypt
the les without the decryption key. . . " [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In late 2015 a particular form of ransomware called CryptoWall-4 came to
the attention of cyber-defenders. Given that its predecessor CryptoWall-3 cost
victims several hundred million dollars in 2015 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], cyber-defenders had to work
hard and fast to understand and contain this new threat. The original unfolding
of how its behaviour was worked out is described on an online forum [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
section merely outlines the basic concepts required to understand its operation.
Attack Outline: The CryptoWall attack works as follows.
1. An unsuspecting victim is phished. This has many forms but often occurs via
an attachment to an email that the user opens. Typically this contains a URI
to a piece of (sometimes obfuscated) JavaScript code known as a Dropper.
2. The Dropper (executing in the victim's browser, with the victim's privileges)
contacts one or more remote computers hosting the CryptoWall malware
executable le. Multiple malware servers are tried as the attacker does not want
a single point of failure (see listing 1). Sometimes the names/IP addresses of
# VictimIP / Port AttackerIP / Port HostNameContacted Resource
192.168.122.163 49184 103.21.59.9 80 shrisaisales . in / ZUQce 4. php ?m= egw 08 th 5 kll
192.168.122.163 49185 173.237.136.250 80 myshop . lk /6872 VF . php ?m= egw 08 th 5 kll
192.168.122.163 49186 195.208.1.122 80 f r c c o n f . com /o51 qYV . php ?w= egw 08 th 5 kll
192.168.122.163 49187 103.21.59.9 80 shrisaisales . in / ZUQce 4. php ?f= okm 0 ua 6s71c58
192.168.122.163 49188 173.237.136.250 80 myshop . lk /6872 VF . php ?x= okm 0 ua 6s71c58
192.168.122.163 49189 195.208.1.122 80 f r c c o n f . com /o51 qYV . php ?u= okm 0 ua 6s71c58
192.168.122.163 49190 103.21.59.9 80 shrisaisales . in / ZUQce 4. php ?r=5 jjh 2t0 np 4
192.168.122.163 49191 173.237.136.250 80 myshop . lk /6872 VF . php ?r=5 jjh 2t0 np 4
192.168.122.163 49192 195.208.1.122 80 f r c c o n f . com /o51 qYV . php ?y=5 jjh 2t0 np 4
Listing 1: Excerpt of consecutive HTTP log entries showing the CryptoWall-4
Dropper `phoning home' to three locations to retrieve the ransomware code.
malware servers are known to the cyber-defence community, but alternative
sites are of course constantly popping up.
3. The malware is immediately installed, executed, and sets to work encrypting
many or all les the victim has permission to write to.
4. In a short space of time, the malware has encrypted the victim's les and
opens the default browser in the victim's session and retrieves a ransom note
web page from malware servers. This note includes payment links to for the
victim to send BitCoin (or other digital currency).
      </p>
      <p>An eavesdropping NIDS is able to observe steps 2 and 4 occurring, but not
step 3 as the victim's le encryption happens locally on the infected endpoint.
Capturing NIDS logs: Security defenders analyse the behaviour of malware
by simulating the attack from within a sandbox computer system and recording
the behaviour of the malware. In this way, a NIDS1 was used to produce logs from
the CryptoWall-4 malware. The four logs of interest were: conn.log summarising
connections between source and host; dns.log detailing DNS queries; http.log
detailing HTTP requests/responses; and files.log detailing le transfers
between source and host machines. We transformed these logs into ground Prolog
facts for use in reasoning about the attack (see section 3).</p>
      <p>Other Domain Knowledge: External information known as threat
intelligence adds information. For this attack, we are informed that shrisaisales.in,
myshop.lk, thegingod.com, frcpr.com, and adrive62.com (amongst others)
are known hosts for malware software downloads.
3</p>
      <p>
        Experiment using ILP to help understand Ransomware
This section describes a controlled experiment to learn logical rules that help us
understand the behaviour of ransomware using the ILP system ALEPH [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
motivation is to test the hypothesis that ILP can be used to recover reasonable
1 In this work we used the open-source NIDS Bro https://www.bro.org/
rules explaining how ransomware works. We choose to use ALEPH because it has
an incremental learning mode which suits the exploratory nature of
understanding attacks in conjunction with a human expert defender.
      </p>
      <p>Raw log data detailing DNS and HTTP requests was obtained from a
sandbox computer that was deliberately allowed to become infected by CryptoWall-4.
Records were converted to a datalog representation (for which illustrative
examples are shown below for the predicates dns/8 and http/17).
dns ( date (2015,11,5,13,4,44,740), 'CZQ 4Zw32 jddFeiK 8z2',
ipv 4(192,168,122,163), ' shrisaisales .in ', ' C_INTERNET ',
.'A', 'NOERROR ', vector ( ipv 4(103,21,59,9))).
.</p>
      <p>.
http ( date (2015,11,5,13,4,45,7), 'CiT 3vV2 lnFWQC 0 NIAl ',
ipv 4(192,168,122,163), ipv 4(103,21,59,9), 'POST ',
' shrisaisales .in ', '/ ZUQce 4. php ', 'egw 08th5 kll ', unset ,
'Mozilla /4.0...', 115, 0, 200, vector ('F7Ze8e1 xZMQcH 7 MbM 8'),
vector ('text / plain '), unset , unset ).</p>
      <p>For convenience, time-stamps are written as date/7 terms (in a year, month,
day, hour, minute, second, millisecond format) with an associated predicate
after/2 to determine if a rst given date term is strictly later than a second.
Other background predicates include http_category/2 to determine the HTTP
request return code as success, redirection, server_error, etc. Other projection
predicates are also included to select particular elds out of raw log records:
http_domain_name_parameter ( Machine , Domain , Name , Param )
:ht.tp (_,_, Machine ,_,_, Domain , Name , Param ,_,_,_,_,_,_,_,_,_).</p>
      <p>.</p>
      <p>.
successful_dns ( Time , UID , Machine , Domain ,IP)
:dns ( Time , UID , Machine , Domain ,' C_INTERNET ','A','NOERROR ', vector (IP)).</p>
      <p>Studying the logs by hand showed the malware makes several HTTP requests
such that the same parameter is sent to di erent domains, and di erent
parameters are sent to the same malware domain. The malware interacts with 3
malware domains and 2 pay sites. So we used these facts as positive and negative
examples, respectively, along with following settings to learn a characterisation
of a malware domain in terms of HTTP interactions:
malware_domain (' shrisaisales .in ').
malware_domain ('myshop .lk ').
malware_domain ('frc - conf . com ').
not( malware_domain (('3 wzn 5p2 yiumh 7 akj . partnersinvestpayto . com '))).
not( malware_domain (('3 wzn 5p2 yiumh 7 akj . marketcryptopartners . com '))).
:- set ( clauselength ,10). :- set ( depth ,1000). :- set (i,3).
:- mode (1, malware_domain (+ domain )). % modeh
:- mode (*, http_domain_name_parameter (- machine ,+ domain ,- name ,- parameter )).
:- mode (*, http_domain_name_parameter (+ machine ,- domain ,- name ,+ parameter )).
:- mode (*, + name n= + name ). :- mode (*, + parameter n= + parameter ).
:- determination ( malware_domain /1, http_domain_name_parameter /4).
:- determination ( malware_domain /1, n= /2).</p>
      <p>With these settings3 ALEPH's induce_incremental was used to learn the
following hypothesis, which correctly explains the way the dropper interacts with
its potential malware domain servers:
malware_domain ( Domain )
:http_domain_name_parameter ( Machine , Domain , Name 1, Param 1),
http_domain_name_parameter ( Machine , Domain , Name 2, Param 1),
http_domain_name_parameter ( Machine , Domain , Name 1, Param 2),
Name 1 n= Name 2, Param 1 n= Param 2.</p>
      <p>After adding this hypothesis to our theory, we continued using ALEPH
interactively with the following existing mode declarations (where we have omitted
the determinations to save space)4 to learn a de nition of a malware fetch:
:- mode (1, malware_fetch (+ time ,+ machine ,+ domain )). % modeh
:- mode (*, successful_dns (- time ,- uid ,+ machine ,+ domain ,-ip)).
:- mode (*, malware_domain (+ domain )).
:- mode (*, http (+ time ,- http_id ,+ machine ,-ip,# http_command ,
+ domain ,- uri_name ,- uri_parameter ,- referer ,
- user_agent ,- size ,- size ,- response_code ,
- malware_request_uid ,# mime_type ,
- malware_response_uid ,# mime_type )).
:- mode (*, after (+ time ,+ time )).
:- mode (*, http_category (+ response_code ,# category )).
:- mode (*, +ip=+ip).
:- mode (*, gt1000(+ size )).</p>
      <p>
        Given a single positive example that we obtained by hand from the logs,
ALEPH constructs the following Bottom Clause [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]:
malware_fetch (A,B,C)
:successful_dns (D,E,B,C,F), malware_domain (C),
http (A,G,B,F,'POST ',C,H,I,J,K,L,M,N,O,
      </p>
      <p>vector ('text / plain '),P, vector ('text / plain ')),
after (A,D), http_category (N, success ), gt1000(M).</p>
      <p>
        Due to the lack of negative examples ALEPH initially proposed an
overlygeneral hypothesis. For technical reasons we could not use the `overgeneral'
option provided by ALEPH to automatically re ne the hypothesis by -subsumption
because it only led to a sequence of successive hypotheses all logically equivalent
to the rejected clause. But, by hand-crafting further examples and constraints,
we were able to learn the more speci c rule below { which subsumes one crafted
by a human expert and correctly states that a malware fetch involves the return
of a large le from an HTTP request to a malware domain:
malware_fetch (A,B,C)
:malware_domain (C), http (A,D,B,E,'POST ',C,F,G,H,I,J,K,L,M,
vector ('text / plain '),N, vector ('text / plain ')), gt1000(K).
3 Refer to the ALEPH manual [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for an explanation of the notation used.
4 Note that predicate gt1000/1 is true if its argument is greater than 1000.
Logically encoding a computer security domain has previously been successful
in an endpoint security context. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] ILP was used to learn to detect bu er
over ow attack construction strategies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Our current work di ers in that,
rstly, it focuses on a network security context and, secondly, it aims to help
humans understand attacks rather than just detect them.
      </p>
      <p>Although we have achieved a working proof-of-principle reconstruction of
some simple rules, we are working on several extensions of this study that we
believe will result in some more powerful and user-friendly interactive ILP
mechanisms to help humans carry out exploratory learning on data-rich domains.</p>
      <p>In particular, we believe it is important to overcome some limitations of
ALEPH's existing approach to eliminating `overgeneral' hypotheses through the
introduction of meta-constraints: which we have found can lead to sequences of
increasingly complex but logically equivalent hypotheses that frustrate the user's
attempts to re ne an overly general clause.</p>
      <p>We have also come to the conclusion that it is important to try and exploit
the vast amount of external real-world data such as logs from other computers
which have not necessarily been infected with malware but whose vast number
of benign interactions can actually provide evidence to justify the rebuttal of
incorrect hypotheses and provide a useful audit trail with connections to linked
data sources in such cases.</p>
      <p>Finally, we believe that automatic methods for exploring language bias and
the use of event calculi for reasoning about temporal transactions will play an
important role in future work.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements</title>
      <p>This work is supported by the EPSRC Summer Bursary Scheme. We also thank
our colleague Paul Byrne for providing the detailed ransomware logs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. \
          <article-title>Aleph One"</article-title>
          .
          <source>Smashing The Stack For Fun And Pro t. Phrack</source>
          <volume>49</volume>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Moyle</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Heasman</surname>
          </string-name>
          .
          <article-title>Machine Learning to Detect Intrusion Strategies</article-title>
          .
          <source>KES</source>
          <year>2003</year>
          , LNCS
          <volume>2773</volume>
          :
          <fpage>371</fpage>
          -
          <lpage>378</lpage>
          ,
          <year>2003</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. BleepingComputer.com.
          <source>CryptoWall 4</source>
          .0:
          <string-name>
            <given-names>Help</given-names>
            <surname>Your Files Ransomware Support Topic</surname>
          </string-name>
          . http://www.bleepingcomputer.com/forums/t/595215/cryptowall-40
          <string-name>
            <surname>-</surname>
          </string-name>
          help
          <article-title>-your-files-ransomware-support-topic/</article-title>
          ,
          <year>November 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          .
          <article-title>The Aleph Manual</article-title>
          . http://web.comlab.ox.ac.uk/oucl/research/ areas/machlearn/Aleph/,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>T.</given-names>
            <surname>Simonite</surname>
          </string-name>
          .
          <article-title>Holding Data Hostage: The Perfect Internet Crime? Ransomware (Scareware)</article-title>
          .
          <source>MIT Technology Review. February</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Cyber</given-names>
            <surname>Threat Alliance. Lucrative Ransomware</surname>
          </string-name>
          <article-title>Attacks: Analysis of the CryptoWall Version 3 Threat</article-title>
          . http://cyberthreatalliance.org/cryptowall-report.pdf,
          <year>October 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <source>Inverse Entailment and Progol, New Generation Computing</source>
          <volume>13</volume>
          (
          <issue>3- 4</issue>
          ):
          <volume>245</volume>
          {
          <fpage>286</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>