<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Italian Conference on CyberSecurity, May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Framework for Automating Security Assessments with Deductive Reasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mauro Andreolini</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Artioli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Ferretti</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirco Marchetti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Colajanni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Righi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica - Scienza e Ingegneria, Università di Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Ingegneria Enzo Ferrari, Università di Modena e Reggio Emilia</institution>
          ,
          <addr-line>Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dipartimento di Scienze Fisiche, Informatiche e Matematiche, Università di Modena e Reggio Emilia</institution>
          ,
          <addr-line>Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>3</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>Proper testing of hardware and software infrastructure and applications has become mandatory. To this purpose, security researchers and software companies have released a plethora of domain specific tools, libraries and frameworks that assist human operators (penetration testers, red teamers, bug hunters) in ifnding and exploiting specific vulnerabilities, and orchestrating the activities of a security assessment. Most tools also require minor reconfigurations in order to operate properly with isomorphic systems, characterized by the same exploitation path even in presence of diferent configurations. In this paper we present a human-assisted framework that tries to overcome the aforementioned limitations. Our proposal is based on a Prolog-based expert system with facts and deductive rules that allow to infer new facts from existing ones. Rules are bound to actions whose results are fed back into the knowledge base as further facts. In this way, a security assessment is treated like a theorem that has to be proven. We have built an initial prototype and evaluated it in diferent security assessments of increasing complexity (jeopardy and boot-to-root machines). Our preliminary results show that the proposed approach can address the following challenges; (a) reaching non-standard goals (which would be missed by most tools and frameworks); (b) solving isomorphic systems without the need for reconfiguration; (c) identifying vulnerabilities from chained weaknesses and exposures.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Security assessments</kwd>
        <kwd>Deductive reasoning</kwd>
        <kwd>Prolog</kwd>
        <kwd>Isomorphic systems</kwd>
        <kwd>Automation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The vast majority of today’s services is made available through Internet-based systems that
allow for a wide audience through high performance, availability, scalability [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] through a
multitude of devices (smartphones, laptops, desktops) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The security of the underlying
protocols, system components and interconnects has become of vital importance to managers,
designers, programmers, administrators and end users, since a breach might have unforeseen
and critical consequences, often leading to business disruption [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This increasing attention
to cyber security evidently arises from external threats, which mainly exploit outdated and
vulnerable systems, weak security controls and logical bugs. Even worse, inexperienced and
insuficiently trained personnel often represents the greatest obstacle to security in the area
of cyber crime. Previous studies [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ] show that most cyber security incidents result from
human errors, misinterpretation of system policies and posture.
      </p>
      <p>
        Common sense, even before previous research [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], suggests that in various contexts, in
order to decrease the risk of security incidents, proper and continuous security assessments need
to be conducted on systems. To this purpose, security assessments are carried out by operators
(certified professionals, security researchers, system administrators) with diferent goals in
mind: (a) estimate the security risk of an exposed infrastructure (vulnerability assessments) (b)
identify and exploit all possible vulnerabilities in a system, showing the consequences of an
attack (penetration testing); (c) simulate a skilled adversary that penetrates an infrastructure to
achieve a specific mission objective, such as data exfiltration, denial of service, monitoring of
specific users (red teaming).
      </p>
      <p>In this complex scenario, security assessments are increasingly dificult to conduct for the
following reasons. The attack surface of current systems becomes larger and larger over time.
Furthermore, the attack patterns are getting more complex, ranging from structural exploits to
lateral movement, cross-domain privilege escalations and exploitation of logical bugs which
may not be immediately caught up by popular security tools. To make matters worse, existing
software ofers little to no support in task planning, definition of non-trivial goals and task
orchestration. Finally, most tools also require minor reconfigurations in order to operate
properly with isomorphic systems, characterized by the same exploitation path even in presence
of diferent configurations.</p>
      <p>In this paper we present a human-assisted framework that tries to address the aforementioned
limitations. The framework is based on a Prolog expert system managing a knowledge base with
facts, deductive rules and associated task templates that should be executed if a rule is found
to apply. In this way, a security assessment is treated like a mathematical theorem that has to
be proven by applying corollaries and initial conditions. A human operator defines a specific
goal and asks if it is achievable with the currently available knowledge. The system applies
deductive rules recursively and executes the associated tasks to solve intermediate subgoals.
The results of these tasks are translated into new facts and fed back into the knowledge base.
Operating in this fashion presents several advantages over current tools:
• it allows to organize existing knowledge about TTPs in a clear, structured, eficient way;
• it allows through deductive rules to define custom assessment goals (e.g., ”Can I exfiltrate
all PDF docxuments?”) that go far beyond the classic ones found in CTFs and penetration
testing labs (”Am I root on this machine?”, ”Can I read /root/flag.txt?”);
• it allows to compute a detailed and minimal task plan for a given assessment goal, if
suficient facts and rules are present;
• it allows to solve isomorphic systems by applying the same deductive rules on slightly
diferent facts.</p>
      <p>We have built an initial prototype and evaluated it in diferent security assessments of
increasing complexity. Our preliminary experimental results show that the proposed approach
can address the following challenges; (a) reaching non-standard goals (which would be missed
by most tools and frameworks); (b) solving isomorphic systems without the need for
reconfiguration; (c) identifying vulnerabilities from chained weaknesses and exposures.</p>
      <p>The remainder of this paper is structured as follows. Section 2 discusses related work and
compares our approach with the current state of the art. Section 3 briefly describes the proposed
architecture. Section 4 discusses our testbed and some preliminary results. Section 5 concludes
the paper with future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In this section we discuss and compare our proposal with previous research eforts in the
following areas: (a) existing tools, libraries and frameworks serving human operators in security
assessments; (b) modeling the activities of a security assessment; (c) defining and enforcing non
trivial goals (that go beyond escalating to administrator or capturing a flag); (d) finding complex
vulnerabilities that chain multiple weaknesses; (e) solving isomorphic systems.</p>
      <sec id="sec-2-1">
        <title>2.1. Existing tools, libraries and frameworks</title>
        <p>A security assessment is carried out through sequence of tasks with the aid of several cyber
weapons: single-purpose tools, libraries, frameworks and sources of information.</p>
        <p>
          Single-purpose tools. In single-purpose tools (from now on, tools) the goal is implicitly
hardcoded and, often, domain specific (e.g. Web, binary analysis, networking, privilege
escalation). The tool is programmed to automate a specific task and to report on whether a given
system component is exposed, vulnerable, exploitable or not. It is the operator’s
responsibility to orchestrate these tools into a task plan. Popular open source tools include scanners
(Nikto, Nmap), exploitation tools (SQLmap [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]), privilege exploitation checkers (LinPEAS and
WinPEAS [11]) and all the proof-of-concept scripts that exploit public CVEs.
        </p>
        <p>Libraries/toolkits Libraries typically provide operators a set of functions for writing their
own custom tools with a hardcoded specific goal when the popular ones fail to solve a specific
problem. Very often, tools are distributed with the library (in which case the library is also called
a toolkit). The extra flexibility provided by libraries has a cost in terms of programming
skills that an operator must have (this is not always the case). As with tools, the operator
is responsible for orchestrating libraries and custom tools into a task plan. Popular libraries
are the Python requests [12] module, the pwntools exploit development package [13], the
Impacket [14] Python package for low-level interaction with Windows network services.</p>
        <p>Frameworks. Frameworks are a coherent collection of binaries, libraries and user interfaces
that allows an operator to orchestrate a larger portion of a security assessment. Popular
frameworks include BloodHound [15] (to reveal the hidden and often unintended relationships
between objects within an Active Directory or Azure environment), textttangr [16] (for binary
analysis), Metasploit [17] (to orchestrate reconaissance, vulnerability analysis, exploitation
and post-exploitation activities) and its GUI frontend Armitage [18]. In contrast to tools and
libraries, frameworks ofer some support towards attack automation and allow to choose among
multiple goals. For example, angr uses symbolic execution to achieve the goals specified by a
user (e.g., find the code path that crashes an application or reveals the correct password). Here,
the goal is explicitely set through programming language primitives and might be obscure to a
non expert. Armitage provides a Hail Mary function that tries all known exploit modules on
every single service discovered in a network. Here, attack automation is enforced in a trivial
fashion through an implicitly defined goal (find any exploitable vulnerability) which is carried
out ineficiently and noisily (it can be easily detected by defenders). BloodHound ofers a nice
graph representation of potential avenues of attack that allow to escalate across Windows
domain users, groups and machines. It allows to write arbitrary queries to exhibit custom attack
paths.</p>
        <p>Sources of information. Sources of informations are documents that explain the internals
of the systems under test and the Techniques, Tactics and Procedures (TTP) used to exploit
them. They range from unstructured (blogs, papers) to structured ones (checklists, mind maps,
attack trees, attack graphs). An operator uses these to define the exact sequence of tasks in a
task plan of a security assessment. Building a task plan is usually a manual, tedious, error-prone
activity.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Modeling security assessments</title>
        <p>Security assessment activities are often modeled through a complex sequence of intermediate
stages, in which privileges are gradually acquired up to being able to reach a specific goal. In
such circumstances, it might be dificult to reconstruct the complete attack path and identify the
concatenation of techniques and tools. A model is a formal representation aimed at describing
the activities of a security professional in terms of techniques used and vulnerabilities exploited
in systems and configurations. The purpose of a model is to identify the most probable routes
within this sequence. Current literature provides several models of outlined below.</p>
        <p>An attack tree [19] represents the attacks to a system and related countermeasures as a
tree structure. The root node is the ultimate goal of the attack. Intermediate nodes represent
intermediate stages of an attack. Intermediate siblings can be combined in AND or OR mode;
AND nodes represent the diferent steps in achieving a goal, while OR nodes represent diferent
ways to achieve the same goal. Leaf nodes are attacks; they can be labelled to enrich the
context of the attack. The notion of attack tree has been extended in literature. Kordy et al. [20]
introduce attack-defense trees that also include possible counteractions of a defender. Since
interactions between an attacker and a defender are modeled explicitly, this extended formalism
allows for a more thorough and accurate security analysis compared to regular attack trees.
Zonouz et al. [21] introduce the attack-response tree, basically an attack-defense tree that also
includes intrusion detection uncertainties due to false positives and negatives in detecting
successful intrusions.</p>
        <p>Attack trees can quickly become complex as the number of vertexes and edges increases; in
particular, it becomes increasingly expensive to identify all paths from a leaf node to the root
node. Keep in mind that in realistic scenarios the number of nodes and interconnections can
easily exceed thousands. In these conditions the addition of a single node is suficient to cause
a significant increase in the number of arcs, with a consequent increase in the new possible
attack paths. Furthermore, since the root node represents the ultimate goal of the attack, it may
be necessary to resort to multiple attack trees to model a complex multi-stage attack.</p>
        <p>An attack graph [22] combines information related to network topology, eligible
vulnerabilities and exploits available on the assets of an IT infrastructure, by providing a visual
representation of the attack paths that an attacker must undertake to achieve specific objectives.
An attack graph allows the analyst to highlight the structure of a network and to quickly identify
the critical paths most subject to attacks. These activities are essential and preparatory to the
subsequent phases of hardening and remediation. An extension of the traditional attack graph is
the Bayesian attack graph [23], which adds probabilities to the edges for modeling uncertainty
in state transitions between nodes. In particular, edges include the probability of exploitation
by an attacker. Therefore, the overall probability of reaching the last state is computed based
on the combinations of these probabilities.</p>
        <p>Although very useful, these models ofer a static view of attacks and mitigations to a system;
they do not model the actions a human actually carries out on a live system. On the other hand,
our proposal associates template actions to the intermediate stages, thus enabling automated
execution of complex assessment paths.</p>
        <p>Machine learning techniques such as deep learning and reinforcement learning have been
investigated to mimic the behavior of attackers [24, 25, 26, 27]. However, while very
interesting, most of these approaches have not yet been tried in realistic environments and model
vulnerabilities almost exclusively through CVEs [28].</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Enforcing complex goals</title>
        <p>
          In most security assessment tools and libraries (SQLmap [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Nikto [29], Nmap [30],
LinPEAS [11] and pwntools [13] to name a few) the goal is implicitly hardcoded and,
often, domain specific (Web, binary, network, privilege escalation). Here, setting a specific goal
simply does not make any sense; the tool is programmed to automate a specific task and to
answer whether a given component is vulnerable or not.
        </p>
        <p>Security assessment frameworks ofer improved support for goals to a varying extent. For
example, angr [16] (a binary analysis framework) uses symbolic execution to achieve the
goals specified by a user (e.g., find the code path that crashes an application or reveals the
correct password). Here, the goal is explicitely set through programming language primitives
and might be obscure to a non expert. Armitage [18] (a GUI frontend of the Metasploit [17]
penetration testing framework) provides a Hail Mary function that tries all known exploit
modules on every single service discovered in a network. Here, the goal is implicitly defined,
trivial (find any exploitable vulnerability), ineficient and noisy (it can be easily detected by
defenders). In contrast, our proposal allows to specify arbitrary goals through precise questions
to the knowledge base. Those questions are answered eficiently by the Prolog solver through
backtracking.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Finding complex vulnerabilities</title>
        <p>To the best of our knowledge, in the landscape of security assessment software support for
chaining existing weaknesses into complex vulnerabilities is limited and, at most, domain
specific. In the latter case, chaining of weaknesses is achieved through pluggable modules
that are configured manually by an operator. For example, SQLmap allows to bypass Web
Application Firewall (WAF) filters through so called tamper scripts that change the payload
accordingly before sending the actual SQL injection. Pwntools allows to solve automatically
simple bufer overflow vulnerabilities by abusing several weaknesses at once (SETUID binary,
missing input validation, missing address layout randomization, address leaks) and building the
appropriate payload. Nmap can be extended modularly through its Nmap Scripting Engine (NSE)
to perform sophisticated, multi-stage enumeration activities. The Metasploit framework
allows executing ”post-exploitation” actions after running an exploit successfully. Typical use
cases involve migrating a Meterpreter shell to 64 bit process or adding network routes to internal
networks.</p>
        <p>In this paper, we use deductive rules to chain dependencies across activities. We think that
this is an improvement over the aforementioned strategies in two ways: (a) it allows to chain
arbitrary rules, thus modeling also uncommon attack patterns; (b) it is more explainable, since
rules are explicitely defined and non ambiguous.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Solving isomorphic systems</title>
        <p>To the best of our knowledge, no existing tool, library or framework can address the challenges
behind isomorphic systems eficiently. Most of them require manual human intervention in the
form of a slightly diferent configuration or input parameter. Some tools ofer limited, domain
specific support. For example, SQLmap probes automatically any parameter for potential SQL
injections, and can even probe HTTP headers if instructed properly. Unfortunately, a user
still needs to input manually the specific URL. The Pwntools library is capable of solving
specific classes of memory corruption vulnerabilties. The Angr framework provides methods
for analyzing whole classes of software binaries automatically, but it needs to be explicitly
programmed to do so. Armitage’s Hail Mary function seems to be the only viable option for
solving isomorphic systems, since it tries every possible exploit available over every exposed
service. However, as remarked in the previous subsections, it is highly ineficient, noisy and
non recursive. Goals are limited to what Metasploit modules can ofer.</p>
        <p>On the other hand, our approach combines deductive rules and discover of new facts from
command outputs to lay down the basis for a more general attack pattern discovery that is
independent of the specific system configuration.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed architecture</title>
      <sec id="sec-3-1">
        <title>3.1. High level design</title>
        <p>The Inferential engine is a Prolog interpreter responsibile for interacting with the knowledge
base through proper Prolog queries and collecting the corresponding answers. A user formulates
questions about specific system properties by translating them into Prolog queries that check
against the existence of specific facts (e.g. ”is this system accessible through SSH?”, ”is there an
IP address for which an accessible SSH server is available?”). The inferential engine recursively
applies deductive rules related to these facts and de facto splits the query into several, easier
subqueries, until these can be trivially answered by already existing facts. The answers to those
subqueries are used to construct the final answer, which might be true, false, or a specific text
satisfying the requested property (e.g. the IP address of the above mentioned system). We note
that this approach applies not only to penetration tests, but to any security assessment that
can be logically split into inter-dependent tasks. These include source code security audits,
defensive checks and vulnerability research. Furthermore, using rules allows to ask arbitrary
goals in form of questions</p>
        <p>The proposed approach is static in nature; an operator has to manually input all facts after
having verified them through external procedures. We overcome this limitation by binding
deductive rules to external commands that are executed whenever the rule is evaluated. The Task
runner takes in input a task specification (a template command and its parameters, usually fact
attributes), assembles a UNIX/Windows command and executes it. Upon command completion
it returns a (command output, exit status) tuple. The Fact parser receives this tuple, analyzes it
and produces in output new Prolog facts that enrich the knowledge base.</p>
        <p>The Orchestrator connects all the aforementioned components together and defines a set of
primitives that may be used to carry on the various procedures involved in a security assessment.
Due to reasons of space, we briefly summarize the main interfaces available and discuss the
most relevant ones in Section 3.2.</p>
        <p>Knowledge base management. CRUD operations (Create, Read, Update, Delete) for rules
and facts, load and save a specific Prolog program, load external Prolog modules.</p>
        <p>Reasoning. Ask a specific question through a query and obtain an answer, explain a query,
obtain query analytics (execution time and steps).</p>
        <p>Task execution. Start a task, interrupt a task, obtain status code and output of a task.</p>
        <p>Fact parsing. Start a parsing activity, interrupt a parsing activity, obtain facts.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Implementation details</title>
        <p>In this section we discuss the most relevant implementation details of the proposed system. As
we will see shortly, almost all modules are implemented in Prolog, the only exception being
the fact parsers which are written in Python. The inferential engine used to implement the
knowledge base is the popular SWI-Prolog [31]. The knowledge base is implemented as a plain
SWI-Prolog program file containing facts and deductive rules. Facts are written as statement
tying arguments as follows:</p>
        <p>HOST_ALIVE(’10.10.10.48’).</p>
        <p>TCP_SERVICE(’10.10.10.48’,22,ssh,’OpenSSH 6.7p1’).</p>
        <p>These facts state that a host is reacheable at the IPv4 address 10.10.10.48 and it hosts sa specific
version of the OpenSSH SSH server. An operator could of course choose another naming
convention for facts, as long as it stays consistent across the entire knowledge base. On the
other hand, deductive rules require that diferent facts match simultaneously:
SSH_ACCESS(X)
:</p>
        <p>SSH_CREDS(X,Y,Z),</p>
        <p>TCP_SERVICE(X,22,ssh,_).</p>
        <p>The interpretation of this rule is ”we have SSH access on X if the vulnerable system exposes an
SSH server on TCP port 22 and we have valid SSH credentials”. The :- operator separates the
deductive rule from its preconditions. The _ character is a wildcard that matches anything in the
corresponding field. In this specific case, the SSH banner is ignored in the rule matching process,
since it is irrelevant. When the Prolog engine evaluates this rule, it tries to bind the X, Y and Z
variables to parameters defined in existing rules. If a match is found, a new SSH_ACCESS(X)
fact is generated with X bound to the correct IP address.</p>
        <p>The previous rule is static in nature; it assumes that if preconditions hold, SSH access is
granted, without actually checking them. We can transform this static rule into a dynamic one
by attaching an external command to it that will be executed in case the rule is evaluated:
SSH_ACCESS(X)
:</p>
        <p>SSH_CREDS(X,Y,Z),
TCP_SERVICE(X,22,ssh,_),
process_create(
path(sshpass), [‘sshpass’, ’-p’, Z, Y, ’@’, X, ‘id’],
[process(PID)]),
process_wait(PID, E),
assertion(E == exited(0))
The interpretation of this rule is ”we have SSH access on X if the vulnerable system exposes
an SSH server on TCP port 22 and we have valid SSH credentials and those credentials can be
actually used to execute a command remotely through SSH”. The process_create() library
function allows to spawn a new process with the validates SSH credentials by trying to remotely
launch the id command, automating password input through the sshpass command. The
process(PID) is used to get process ID, which will be used by the process_wait() function
to synchronously wait for its termination. The exit code is bound to the E variable; if the exit
code is 0 and every other precondition matches, the rule is valid and a new SSH_ACCESS(X)
fact is produced with X bound to the correct IP address.</p>
        <p>The previous rule basically validates a new fact by actually checking it, and expands the
knowledge base with it. However, on several occasions an external command might generate a
plethora of new facts. A classic example is a network scan with the nmap command that produces
a set of TCP_SERVICE facts (one for each new TCP service discovered). In this preliminary
prototype, we pipe such an external command to a filtet written in Python 3. The filter takes in
input the output of the external command, analyzes it and produces a series of facts in output.
These facts are interpreted by the Prolog environment as new facts. For example, given the
following nmap output (simplified due to space reasons):</p>
        <p>PORT
22/tcp open
53/tcp open
80/tcp open</p>
        <p>STATE SERVICE REASON VERSION
ssh syn-ack ttl 63 OpenSSH 6.7p1
domain syn-ack ttl 63 dnsmasq 2.76
http syn-ack ttl 63 lighttpd 1.4.35
the corresponding Python filter would produce the following output:</p>
        <p>TCP_SERVICE(’10.10.10.48’,22,ssh,’OpenSSH 6.7p1’).</p>
        <p>TCP_SERVICE(’10.10.10.48’,53,dns,’dnsmasq 2.76’).</p>
        <p>TCP_SERVICE(’10.10.10.48’,80,http,’lighttpd 1.4.35’).</p>
        <p>Several strategies might be applied in case a task fails. The simplest one does not produce facts;
more advanced ones generate facts that signal failures at a specific time, for example to keep
track of errors. For the sake of simplicity, in this preliminary version of our prototype each task
has its own dedicated fact parser and no rules are produced in case of task failure.</p>
        <p>We introduce as an example the HOST_SCANNED() rule that applies correctly after a network
scan executes successfully:</p>
        <p>HOST_SCANNED(X)
:process_create(path(nmap_wrap.py),</p>
        <p>[‘nmap_wrap.py’, X, [stdout(pipe(Stream))]),
read_stream_to_codes(Stream, Codes),
close(Stream),
string_codes(Output,Codes),
split_string(Output,”\n”,”\n”,Lines),
maplist(assert_from_string,Lines).</p>
        <p>The interpretation of this rule is ”host X is scanned if nmap could be executed correctly through
its filter and the output transformed and evaluated to a set of new facts”. Here we execute
a nmap_wrap.py Python 3 script that runs nmap, reads its output and produces the desired
facts. The read_stream_to_codes() function reads the output from the pipe created with
the stdout(pipe(Stream)) function. The string_codes() function converts the byte
stream to a string using the system character encoding (usually UTF-8). The split_string()
function splits the filter output (a sequence of facts, one fact per line) into an array of fact strings.
Finally, maplist() executes the assert_from_string() function to each fact string, de
facto inserting it into the knowledge base.</p>
        <p>The main interface to the aforementioned functions is SWI-Prolog’s REPL (Read Evaluate
Print Loop). Listing all facts and rules in the knowledge base is trivial through the listing
function:</p>
        <p>listing().</p>
        <p>It is also possible to list specific facts or rules by passing them as a parameter to the listing()
function:
listing(HOST_ALIVE).
listing(HOST_SCANNED).</p>
        <p>HOSTS_ALIVE(X).</p>
        <p>X = ’10.10.10.48’</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental evaluation</title>
      <p>Querying for a specific goal is as easy as querying a Prolog fact. For example, to find all alive
hosts we can query all known HOST_ALIVE() facts:
The Prolog engine tries to bind all known HOSTS_ALIVE parameters to the X variable. One
possible answer would be:
In Section 4.1 we define the testbed (hardware, operating system, software) and the systems
assessed during the experiments. In Section 4 we compare our prototype with some popular
tools and frameworks available to operators. The main goal is to assess strengths and weakness
of the prototype.</p>
      <sec id="sec-4-1">
        <title>4.1. Testbed</title>
        <p>The testbed architecture running our prototype is based on of-the-shelf hardware and software
components. The logic engine is based on SWI-Prolog 9.0.3. All software components are
written in Python3, running on a Python3 3.10.9 interpreter. The operating environment is
GNU/Linux (specifically, the Arch software distribution). The prototype executes some external
commands used during penetration testing activities; those programs have been installed from
the corresponding GitHub repositories. In our preliminary evaluation we consider the following
systems that encompass a wide range of environments (Web, UNIX, binary).
Web for Pentester. Web for Pentester [32] is a live CD intended to run as a virtual machine. It
hosts a series of simple Web-based challenges. Our prototype targets the XSS, SQL injection,
Directory Traversal, File Include, Code Injection and Command Injection challenges. These are
fairly standard and are solvable with popular tools.</p>
        <p>Nebula. Nebula [33] is a live CD intended to run as a virtual machine. It hosts a series of
simple UNIX-based privilege escalation challenges based on capture-the-flags. Our prototype
targets the Level00 (identification and execution of SETUID binaries) and Level02 (environment
variable command injection) challenges. These are not solvable with popular tools and require
custom scripts.</p>
        <p>Vulnhub Kioptrix. Kioptrix [34, 35] is a series of live CDs intended to run as virtual machines.
They are boot-to-root challenges, where one user is supposed to obtain a foothold and escalate
privileges to root. The prototype targets Kioptrix1 (which exposes a vulnerable SMB server that
yields remote code execution as root) and Kioptrix2 (which exposes a Web server vulnerable to
SQL and command injection, and hosts a vulnerable linux kernel). Kioptrix1 is automatically
solvable with Armitage, while Kioptrix2 is not. We also built a clone of Kioptrix2 with diferent
configuration parameters (HTTP server port on TCP PORT 8888, vulnerable URLs /login.php
and ping.php, query string parameter command) to evaluate tools against structurally similar
attack patterns.</p>
        <p>HTB Mirai. Mirai [36] is a boot-to-root machine hosted by the HACKTHEBOX gaming
platform. It models a Raspberry PI vulnerable to the attacks of the Mirai botnet (default
credentials, user in the ”disk” group that can read the whole file system).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Limitation of existing tools</title>
        <p>Web for Pentester challenges are easily solved by common tools such as SQLmap, XSSer and
commix. However, these tools are domain-specific and only help the operator in solving subtasks.
They do not aid in building an attack plan, nor do they allow to set custom goals. Curiously,
Armitage cannot solve these challenges due to lack of CVEs on the Web server.</p>
        <p>We could not solve any of the Nebula challenges with popular tools and frameworks. We had
to resort to custom scripts that use the popular pwntools library. While operators could write
custom scripts to automate any conceivable assessment, we note that (a) the coding efort is
often considerable, (b) the resulting code is often not modular and, thus, not easily reusable in
other engagements, (c) goals are hardcoded and attack planning is simply not possible.</p>
        <p>Tools and frameworks help only partially in solving boot-to-root machines and need heavy
human orchestration. The only exception is Kioptrix1 that hosts a structural CVE on SMB,
allowing the operator to escalate privileges to administrator. Armitage is capable to solve
this box with its Hail Mary attack. Kioptrix2 (and its isomorphic clone) can only be partially
exploited in an automated fashion (due to the structural exploit on the Linux kernel). Obtaining
foothold on this machine requires orchestrating SQL and command injection tools manually or
through custom scripts.</p>
        <p>No single tool or framework allows to automate the assessment of Mirai. This machine can
be solved through a custom script which would be very complex and with a hardcoded, non
reusable attack plan.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Behavior of the proposed prototype</title>
        <p>We prepared an initial knowledge base with all TTPs needed to solve the challenges presented
inSection 4.1. The proposed prototype is able to solve them all automatically and can print the
attack plan of each assessment. The only input needed is the system’s entry point (IP address
or URL). Contrary to popular tools and frameworks, the knowledge base can be reused as is in
similar security assessments.</p>
        <p>Let us discuss some details about the Kioptrix2 boot-to-root machine. Our prototype found
the attack plan presented in Figure 2. The graph has been pruned for space reasons and only
the relevant paths to privilege escalation have been pointed out. The same attack plan was
produced when assessing the isomorphic clone of Kioptrix2, We defined rules to question non
custom goals, such as finding all PDF files after having escalated privileges to administrator and
having found valid SSH credentials. We use a find_pdf_wrap.py filter that executes find /
-type f -name *.pdf remotely through sshpass, reads the output and transforms it into
several PDF_PATH(X, PATHNAME) facts.</p>
        <p>PDF_FILES(X)
:</p>
        <p>HOST_PWNED(X),
SSH_CREDS(X, root, Z),
process_create(path(find_pdf_wrap.py),
[‘find_pdf_wrap.py’, X, root, Z],
[process(PID), stdout(pipe(Stream))]),
read_stream_to_codes(Stream, Codes),
close(Stream),
string_codes(Output,Codes),
split_string(Output,”\n”,”\n”,Lines),
maplist(assert_from_string,Lines).</p>
        <p>When queried against, the corresponding output is:</p>
        <p>PDF_PATH(’10.10.10.48’ ’/root/Documents/Assessment.pdf’).
PDF_PATH(’10.10.10.48’ ’/root/Documents/Invoice.pdf’).</p>
        <p>...</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and future work</title>
      <p>Proper testing of systems is paramount to reduce the risk behind cyber attacks. Security
researchers and practitioners have spent most of their eforts in diferent avenues of work: (a)
organizing knowledge through multiple sources of information (both structured and
unstructured); (b) modeling attack plans through popular data structures (graphs, trees) and using
machine learning techniques; (c) writing domain-specific tools, software libraries, frameworks
that aid an operator in enumerating systems, finding and exploiting vulnerabilities. However,
we feel that previous eforts lack in several aspects: (a) they are heavily oriented to exploiting
public CVEs and basic vulnerabilities, completely neglecting chained ones; (b) they mainly leave
to the operator the burden of designing attack plans and specifying non trivial assessment goals;
(c) they still exhibit a limited capability of self-discovering future avenues of attack eficiently;
(d) they need reconfigurations even in presence of isomorphic systems.</p>
      <p>In this paper we present a human-assisted framework that tries to address these challenges.
We investigate the possibility of treating a security assessment as a mathematical theorem to
be proved. Our proposal is based on a Prolog-based expert system with facts, deductive rules
and associated test templates. A human operator asks the system if a specific goal is achievable.
Applying deductive reasoning, our prototype tries to solve the goal by building an attack plan.
New facts are discovered and fed back into the knowledge base as the assessment proceeds.</p>
      <p>Our preliminary experimental results show that the proposed approach can address the
following challenges; (a) reaching non-standard goals (which would be missed by most tools
and frameworks); (b) solving isomorphic systems without the need for reconfiguration; (c)
identifying vulnerabilities from chained weaknesses and exposures.</p>
      <p>Our proposal is far from perfect. Maintaining a coherent naming scheme for facts and
rules becomes hard their increasing volume. Rule expressiveness is also an issue since Prolog
syntax is far from simple. We plan to extend our work in diferent directions. We will assess
the scalability of a single knowledge base and investigate the possibility of implementing a
hierarchy of specialized knowledge bases. Furthermore, we will define diferent evaluation
metrics (such as number of attack steps and attack tree depth) to better understand the quality
of the attack plan generated by our prototype. We plan to add the concept of priority to single
tasks to prefer specific avenues to others. Finally, we will evaluate our prototype to even larger
networks involving lateral movements and privilege escalations across several users.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work was partially supported by project SERICS (PE00000014) under the MUR National
Recovery and Resilience Plan funded by the European Union – NextGenerationEU.
[11] C. Polop, PEASS-ng - Privilege Escalation Awesome Scripts SUITE new generation, https:
//github.com/carlospolop/PEASS-ng/tree/master/winPEAS, 2019.
[12] K. Reitz, Requests: HTTP for Humans, https://requests.readthedocs.io/, 2011.
[13] M. Svensson, Pwntools - CTF toolkit, https://github.com/Gallopsled/pwntools, 2013.
[14] A. Solino, Impacket, https://www.secureauth.com/labs/open-source-tools/impacket/, 2012.
[15] A. Robbins, R. Vazarkar, W. Schroeder, BloodHound: Six Degrees of Domain Admin,
https://bloodhound.readthedocs.io/en/latest/, 2019.
[16] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng,
C. Hauser, C. Kruegel, et al., Sok:(state of) the art of war: Ofensive techniques in binary
analysis, in: 2016 IEEE symposium on security and privacy (SP), IEEE, 2016, pp. 138–157.
[17] H. D. Moore, Metasploit - The world’s most used penetration testing framework, https:
//www.metasploit.com/, 2003.
[18] R. Mudge, Armitage - Cyber Attack Management for Metasploit, https://github.com/
rsmudge/armitage, 2015.
[19] B. Schneier, Attack trees, Dr. Dobb’s journal 24 (1999) 21–29.
[20] B. Kordy, P. Kordy, S. Mauw, P. Schweitzer, Adtool: security analysis with attack–defense
trees, in: International conference on quantitative evaluation of systems, Springer, 2013,
pp. 173–176.
[21] S. A. Zonouz, H. Khurana, W. H. Sanders, T. M. Yardley, Rre: A game-theoretic intrusion
response and recovery engine, IEEE Transactions on Parallel and Distributed Systems 25
(2013) 395–406.
[22] X. Ou, W. F. Boyer, M. A. McQueen, A scalable approach to attack graph generation,
in: Proceedings of the 13th ACM conference on Computer and communications security,
ACM, 2006, pp. 336–345.
[23] N. Poolsappasit, R. Dewri, I. Ray, Dynamic security risk management using bayesian
attack graphs, IEEE Transactions on Dependable and Secure Computing 9 (2011) 61–74.
[24] S. Chaudhary, A. O’Brien, S. Xu, Automated post-breach penetration testing through
reinforcement learning, in: 2020 IEEE Conference on Communications and Network
Security (CNS), IEEE, 2020, pp. 1–2.
[25] F. M. Zennaro, L. Erdodi, Modeling penetration testing with reinforcement learning
using capture-the-flag challenges: trade-ofs between model-free learning and a priori
knowledge, arXiv preprint arXiv:2005.12632 (2020).
[26] M. C. Ghanem, T. M. Chen, Reinforcement learning for intelligent penetration testing, in:
2018 Second World Conference on Smart Trends in Systems, Security and Sustainability
(WorldS4), IEEE, 2018, pp. 185–192.
[27] R. Maeda, M. Mimura, Automating post-exploitation with deep reinforcement learning,</p>
      <p>Computers &amp; Security 100 (2021) 102108.
[28] D. R. McKinnel, T. Dargahi, A. Dehghantanha, K.-K. R. Choo, A systematic
literature review and meta-analysis on artificial intelligence in penetration testing and
vulnerability assessment, Computers &amp; Electrical Engineering 75 (2019) 175–188. URL:
https://www.sciencedirect.com/science/article/pii/S0045790618315489. doi:https://doi.
org/10.1016/j.compeleceng.2019.02.022.
[29] C. Sullo, Nikto, https://cirt.net/Nikto2, 2001.
[30] G. Lyon, Nmap: the Network Mapper, https://nmap.org/, 1997.
[31] J. Wielemaker, An overview of the swi-prolog programming environment, in: Proceedings
of the 13th International Workshop on LP Environments, 2003.
[32] PentesterLab, Web for Pentester, https://pentesterlab.com/exercises/web_for_pentester/
course, 2012.
[33] A. Grifiths, Nebula, https://exploit.education/nebula/, 2019.
[34] Kioptrix, Kioptrix Level 1, https://www.vulnhub.com/entry/kioptrix-level-1-1,22/, 2015.
[35] Kioptrix, Kioptrix Level 2, https://www.vulnhub.com/entry/kioptrix-level-11-2,23/, 2015.
[36] H. Pylarinos, J. Hooker, A. Zikopoulos, HACKTHEBOX, https://www.hackthebox.com/,
2017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lampesberger</surname>
          </string-name>
          ,
          <article-title>Technologies for web and cloud service interaction: a survey</article-title>
          ,
          <source>Service Oriented Computing and Applications</source>
          <volume>10</volume>
          (
          <year>2016</year>
          )
          <fpage>71</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jameel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shahzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sajid</surname>
          </string-name>
          ,
          <article-title>The users experience quality of responsive web design on multiple devices</article-title>
          ,
          <source>in: Proceedings of the 2nd International Conference on Future Networks and Distributed Systems</source>
          , ACM,
          <year>2018</year>
          , p.
          <fpage>69</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stuart</surname>
          </string-name>
          ,
          <article-title>Systematically understanding the cyber attack business: A survey, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>70</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Bowen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Devarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stolfo</surname>
          </string-name>
          ,
          <article-title>Measuring the human factor of cyber security</article-title>
          ,
          <source>in: 2011 IEEE International Conference on Technologies for Homeland Security (HST)</source>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>230</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Maglaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Janicke</surname>
          </string-name>
          ,
          <article-title>Heart-is: A novel technique for evaluating human error-related information security incidents</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>80</volume>
          (
          <year>2019</year>
          )
          <fpage>74</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kraemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Carayon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clem</surname>
          </string-name>
          ,
          <article-title>Human and organizational factors in computer and information security: Pathways to vulnerabilities</article-title>
          ,
          <source>Computers &amp; security 28</source>
          (
          <year>2009</year>
          )
          <fpage>509</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Knowles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Prince</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hutchison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F. P.</given-names>
            <surname>Disso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>A survey of cyber security management in industrial control systems</article-title>
          ,
          <source>International journal of critical infrastructure protection 9</source>
          (
          <year>2015</year>
          )
          <fpage>52</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vellaithurai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zonouz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Berthier</surname>
          </string-name>
          , Cpindex:
          <article-title>Cyber-physical vulnerability assessment for power-grid infrastructures</article-title>
          ,
          <source>IEEE Transactions on Smart Grid</source>
          <volume>6</volume>
          (
          <year>2014</year>
          )
          <fpage>566</fpage>
          -
          <lpage>575</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Nurse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Creese</surname>
          </string-name>
          , D. De Roure,
          <article-title>Security risk assessment in internet of things systems</article-title>
          ,
          <source>IT professional 19</source>
          (
          <year>2017</year>
          )
          <fpage>20</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Damele</surname>
          </string-name>
          , M. Stampar, SQLmap
          <article-title>- Automatic SQL injection and database takeover tool</article-title>
          , https://it.wikipedia.org/wiki/Sqlmap,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>