<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Adaptive Experimental Design for Intrusion Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicholas R. Jennings</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Imperial College London</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kate Highnam</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zach Hanif</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ellie Van Vogt</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonali Parbhoo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Mafeis</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Intrusion Research, Data Collection, Honeypots, Experimental Design, Randomized Control Trials</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Loughborough University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Intrusion research frequently collects data on attack techniques currently employed and their potential symptoms. This includes deploying honeypots, logging events from existing devices, employing a red team for a sample attack campaign, or simulating system activity. However, these observational studies do not clearly discern the cause-and-efect relationships between the design of the environment and the data recorded. Neglecting such relationships increases the chance of drawing biased conclusions due to unconsidered factors, such as spurious correlations between features and errors in measurement or classification. In this paper, we present the theory and empirical data on methods that aim to discover such causal relationships eficiently. Our adaptive design (AD) is inspired by the clinical trial community: a variant of a randomized control trial (RCT) to measure how a particular “treatment” afects a population. To contrast our method with observational studies and RCT, we run the first controlled and adaptive honeypot deployment study, identifying the causal relationship between an ssh vulnerability and the rate of server exploitation. We demonstrate that our AD method decreases the total time needed to run the deployment by at least 33%, while still confidently stating the impact of our change in the environment. Compared to an analogous honeypot study with a control group, our AD requests 17% fewer honeypots while collecting 19% more attack recordings than an analogous honeypot study with a control group.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Automated cyber intrusion attacks continuously scan and probe internet-connected systems [
        <xref ref-type="bibr" rid="ref1 ref2">1,
2</xref>
        ]. The state of the art in cyber intrusion defenses employ observational techniques augmented
with automated statistical techniques, including temporal point processes and machine learning.
This approach has been efective, but is susceptible to a variety of biases that might mislead or
confuse such solutions from generalizing or learning quicker [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. We aim to limit the impact
of potential bias by improving the datasets that train intrusion detection methods.
      </p>
      <p>
        Intrusion datasets can be acquired from third party vendors or compiled by recording logs
from existing, simulated, or newly deployed research infrastructure [
        <xref ref-type="bibr" rid="ref10 ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9, 10</xref>
        ].
Conventionally, these methods provide observational data, containing information on current
CEUR
Workshop
Proceedings
      </p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>Security
“a study comparing honeypots with and without a vulnerability”
“our Ubuntu honeypots with our host-based sensors”</p>
      <p>“a honeypot”
“starting more honeypots with specific characteristics”</p>
      <p>
        “attacker technique for exploit”
“corruption” or “the presence or insertion of a vulnerability”
“corrupted”
attacks implemented against the given systems and how to observe the attacks in the given
environment. However, even in large volumes, observational data has a high potential for bias
due to uncontrolled characteristics, including possible spurious correlations between variables
and outcomes or measurement error [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref4 ref9">11, 4, 12, 9, 13</xref>
        ].
      </p>
      <p>
        To limit potential erroneous conclusions by both statistical models and researchers, we
explore intrusion data collection with a control group: a collection of systems studied that
are not altered to compare with identical systems that have been altered. Our usage of control
groups in an experimental study draws inspiration from clinical research, one of the oldest
ifelds conducting control-based studies [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]. In healthcare, a typical control group study
randomly recruits a subset of a population to remain untreated (as the control) and treated (as
the altered version). This is known as a randomized-control trial (RCT), the gold standard
for clinical trial methods; its random assignment to groups minimizes the impact of researcher
biases while evaluating causal relationships [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Our method is based on adaptive design
(AD): a variant of RCT that adds pre-planned opportunities to modify aspects of an ongoing trial
in response to data accumulated during the study, without invalidating its integrity [17, 18, 19].
RCT and AD both account for known conditions and unforeseen events (e.g., a pandemic or
war) which might require the trial to end early by separating the trial into multiple stages to
run interim analysis.
      </p>
      <p>Unlike clinical trials with human patients, intrusion research aims to increase the occurrence
of events of interest (i.e., intrusions or exploits). See Table 1 for some of the terminology from
healthcare mapped to security as it is used to define our work. To demonstrate our
intrusionfocused interventional methods, we use a honeypot, a common tool for recording intrusion
data. A honeypot is an intentionally vulnerable system with covert monitoring that is used to
both entice and observe attackers without their knowing [20, pg.7].</p>
      <p>Traditional honeypot deployments - or “vanilla” deployments as we will call them in this
paper - expose a large number of identical vulnerable systems for a particular (extended) length
of time to collect intrusion data [21, 22, 23, 24, 25][20, pg.20-21]. While suficiently large and
long-lived vanilla deployments all but guarantee observations and can summarize the general
state of automated threats, they carry several risks and costs that could be unacceptable. If a
meaningful quantity of identical honeypots were left online, it would provide an opportunity
for adversaries to identify the presence of the employed monitoring tools. This can hinder
observations (i.e., bias the data) and render the tools useless (i.e., when adversaries stop acting
after detecting active monitoring or debugging tools). Additionally, large scale deployments
cost time and money, which absorbs budget, and can hinder or preclude timely observations.</p>
      <p>In this paper, we present the first control-based deployment method for honeypots to optimize
resource allocation and limit honeypot exposure. Our method is used in an exemplary study to
determine the impact of an ssh vulnerability on cloud servers across the United States. When
compared to the vanilla deployment method with the same setup, we find that AD can determine
the impact of the vulnerability in 33% of the total trial duration, while limiting the likelihood of
error. and requesting 17% fewer honeypots overall. With a control group, our AD collects 19%
more attack recordings than the RCT trial.</p>
      <p>Our contributions in this work are as follows:
• The first adaptive method for a control study in security, optimizing resource allocation
and duration of the study based on the events seen in prior stages and error tolerance.
• The first interventional study using honeypots, demonstrating the efectiveness of our
adaptive method and how it helps attribution of environment changes during data
collection.</p>
      <p>Although we showcase our method with honeypot deployments, it can be used for other control
studies in security applications. For example, one could study the impact of a new spam email
training on the rate of spam emails being opened, or on the removal of local file inclusion access
on the exploitation rate of a web application hosting other vulnerabilities. Our AD strategy
uses a new interpretation of clinical trial methodologies, encouraging infections rather than
preventing them mid-trial. Additionally, our study ran using automated scripts, presenting
the first fully-automated experimental study. This automation and cheaper application setting
enables future inventors of new clinical trial methodologies a new venue to showcase their
improvements, rather than run an expensive trial with patients.</p>
      <p>This paper is structured as follows. Section 2 reviews control studies in security and provides
a brief background on the healthcare-based methods that inspired this work. We then introduce
our new AD method in Section 3 while contrasting it with the vanilla and RCT methods. In
Section 4, we implement our method against the vanilla and RCT methods in a exemplary
honeypot deployment. We conclude and consider how the method might not behave the same
in other settings in Sections 5.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <p>An experimental study starts from a hypothesis on how a change or treatment will alter an
aspect of a given environment [26]. The hypothesis is then tested, in the simplest form, by
observing a control group that is unchanged and comparing it with another (ideally identical)
group that is then changed. An RCT provides one of the strongest evidence on an intervention’s
impact due to its random allocation of participants to treatment arms (there might be multiple
treatments available) or control arm (standard care or a placebo) [27, 28]. This process removes
potential bias of unaccounted factors in the environment. Participants are then observed and
their outcomes recorded.</p>
      <p>Part of the rigor of RCTs is that all aspects of the trial conduct and (interim) analysis must be
documented prior to the execution of the study. This prospective approach avoids the
introduction of bias from investigators and statisticians mid-trial. An important part of this planning
process is the sample size calculation. Using estimates and prior knowledge of interventions,
trialists (those conducting the trial) can estimate the required number of participants that must
be recruited in order to detect a significant diference in outcomes between the groups [ 29].
After approximating the needs and impacts of the study, the execution of the study should be
justified.</p>
      <p>For medicine, developing the justification to go from drug discovery to licensing can take an
average of over 10 years [30]; recent advances in trial methodology have been able to improve
the eficiency of trials in order to reduce this time [ 31]. The conduct and methodology of a
clinical trial are highly regulated because of the direct involvement of patients [28]. However,
this is not often a barrier to research in cyber security.</p>
      <p>Security has several advantages in running experimental studies. Digital infrastructure is
cheap with the advancement of cloud technologies [32]. Our honeypot study could have cost
up to $2,000USD with 600 participants, compared to the millions of USD required in drug
testing [33]. Digital resources can also be exactly copied as many times as needed, whereas
biological studies must make strong assumptions about the similarity between patients. Security
experimental studies can be quicker to complete if they consider the attacks that occur at a
higher frequency and pace of development than a biological infection or disease.</p>
      <p>In this section, we review previous experimental studies in security settings that consider or
run control groups. We finish this section by briefly highlighting the other techniques developed
to improve the classic experimental design that have spawned from the constrained medical
setting.</p>
      <sec id="sec-3-1">
        <title>2.1. Control Trials in Security</title>
        <p>
          Our work is not the first to apply clinical methodology within security; prior works focus on the
interaction of security and users of digital systems. For example, Simoiu et al. [
          <xref ref-type="bibr" rid="ref17">34</xref>
          ] survey user
awareness of ransomware in the general U.S. population. Lin et al. [
          <xref ref-type="bibr" rid="ref18">35</xref>
          ] analyze spearphishing
cyber attacks and its correlation with various human demographics. A common human-oriented
security study involves antivirus software and how it is used by the lay person [
          <xref ref-type="bibr" rid="ref19 ref20 ref21">36, 37, 38</xref>
          ]. These
works implement an experimental study to find strong indications of how successful antivirus
software can be based on human performance. Yen et al. [
          <xref ref-type="bibr" rid="ref22">39</xref>
          ] further extends this research area
by incorporating the users job title and responsibilities to contextualize the impact of malware
within a company. However, we circumvent the recruitment (and cost) of human involvement by
focusing on how these methods can be applied to digital systems with automated, autonomous
threats.
        </p>
        <p>
          Few experimental studies have been published without humans in cyber security. Bošnjak
et al. [
          <xref ref-type="bibr" rid="ref23">40</xref>
          ] prepare an experimental study to systematically evaluate defenses for shoulder
surfing attacks after an extensive literature review. Gil et al. [
          <xref ref-type="bibr" rid="ref24">41</xref>
          ] approach this by using a
case-control study to identify complex relationship of threats to a single host within a large
network. Although it is called a “study,” a case-control study filters and randomly selects data
from purely observational studies for its patient population. There is no interaction with the
data collection process. Causal relations can be learned from such data but there is no control
for error or bias. We discuss how our method controls for error in Section 3.
        </p>
        <p>These studies indicate a major challenge in security experimental studies: the need for
humaninterpretable interventions. Recording data in medical settings is relatively straightforward,
e.g., heartbeats per minute or body temperature indicating there is a fever. Understanding how
these indicate a particular disease is also fairly intuitive. But translating host-based logs to
indications of unwanted activity in a system is immensely dificult, let alone stating the type
of unwanted activity. Thus, mapping “symptoms” from sensor logs can be dificult unless we
control our human-level interventions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Advances in Control Trial Methodology</title>
        <p>
          Randomization limits the impact of unknown external factors in influencing a participant’s
chance of receiving a treatment, we therefore expect the baseline characteristics of participants
to be similar between studied groups. If there is a concern about some baseline characteristics
that may be prognostic, then we can stratify randomization based on these variables without
loss of statistical strength [
          <xref ref-type="bibr" rid="ref25">42</xref>
          ].
        </p>
        <p>
          Traditional RCTs are known for their rigor and complete pre-specification of procedure. A
common adaptation to an RCT is the inclusion of stopping rules for eficacy, safety, or futility [
          <xref ref-type="bibr" rid="ref26">43</xref>
          ].
If the trial has gathered enough evidence that an intervention is efective, or conversely that
the intervention is harmful, then the study can cease, saving resources for a future study or a
re-run of the same study with corrections. Similarly, interim checks would also catch if there is
not enough evidence of an intervention’s efect to reach a significant conclusion [
          <xref ref-type="bibr" rid="ref27">44</xref>
          ].
        </p>
        <p>Contemporary approaches to running RCTs aim to make the process of evaluating an
intervention faster and more eficient [ 31]. As mentioned, we implement one such method, adaptive
design (AD). The principle of AD is that it permits certain aspects of a study to be modified
intermittently based on available evidence.</p>
        <p>
          Interim data used to inform stopping decisions can also be used to inform an updated sample
size calculation, or even to update randomization allocation proportions [19]. A well-known
example of this is the REMAP-CAP study, which has many treatments available across multiple
domains for treating community acquired pneumonia, including severe COVID-19, in intensive
care unit settings [
          <xref ref-type="bibr" rid="ref28">45</xref>
          ]. Platform trials have also been used to successfully evaluate many
diferent types of interventions simultaneously [
          <xref ref-type="bibr" rid="ref29">46</xref>
          ]. Monthly interim analyses are conducted
and a Bayesian model is used to update randomization probabilities for new participants entering
the trial, so that patients are randomized to treatments that are more likely to benefit them [
          <xref ref-type="bibr" rid="ref30">47</xref>
          ].
In the present study, we draw on the concept of response adaptive randomization to optimize
the allocation of honeypots.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Adaptive Design for Security Applications</title>
      <p>In this section, we define our methods for interventional data collection in security applications.
This setting typically studies adversarial efects, i.e., encouraging intrusions as data are collected.
We call the change or treatment (e.g., a drug or surgical procedure) made to a population a
corruption1. We shall now enumerate the key terms to document prior to executing a study as
we define our AD method.</p>
      <p>The population considered in the experiment is assumed to be a device or contained system
that is or is not corrupted. Deploying a copy of these devices or systems is the same as recruiting
a patient into a study. The goal of the study is to achieve a set of objectives, evident from
observing a particular event of interest. One can identify an event of interest through the
recorded logs on the population; these events should be clear evidence that the corruption
caused some change in system behavior. For example, if the corruption is a new login website
and we are interested in its efect on attempted SQL injections, then events of interest should
be a record of when an SQL injection occurs.</p>
      <p>The objectives of our methods are always two fold:</p>
      <sec id="sec-4-1">
        <title>1. Confirm evidence of corruption’s impact within the population.</title>
        <p>2. Maximize the recording of events of interest.</p>
        <p>Returning to the SQL injection example, if we wanted to collect a diverse range of attacks rather
than automated repeated uses of the same attack, the events of interest could be only recorded
if not seen prior.</p>
        <p>Before recording events and running the study, it is crucial to accurately define endpoints
to anticipate possible errors or miscalculations. Similar to clinical trials, we recommend setting
an endpoint bounding the trial resources by the given budget. We also recommend stopping
the trial early if the adaptive design tries to deploy a group that is too small. If this occurs in
the early stages of the trial, it can indicate the rates of allocation have converged to nothing
conclusive. This should be followed with a manual review by human experts. While it might
seem inconsequential, it is good practice to list obvious endpoints. This might include recording
an unexpected exploit technique or an overwhelming number of exploits that break data
collection infrastructure. All of these details must be defined prior to the study execution to
maintain the robustness of the trial.</p>
        <sec id="sec-4-1-1">
          <title>3.1. Trial Methodologies</title>
          <p>In this section, we review each method for comparison to our AD before using them in a
honeypot study (Section 4). The pseudo code for the traditional observational study (vanilla),
RCT, and our AD are presented in Methods 1, 2, and 3, respectively. See Appendix A for the
definitions of the functions. The highlighting indicates the similar lines between the algorithms.
Notably, the RCT and AD trials are split into  stages with early stopping - shown as the loops
on line 2 in both Methods 2 and 3, highlighted in pink. Each stage deploys some proportion
of control and corrupted systems, waits for the stage duration, saves the logs, cleans up the
deployment and reviews the logs from the stage to see if an endpoint condition has been reached.
The diference between the standard RCT and our AD is what occurs during the interim update.
1We chose corruption to remove the benevolent intentions frequently afiliated with “treatment” from healthcare
settings. A reminder that the translations for other clinical trial terms can be found in Table 1.
Method 1 Vanilla Observational Study
Input: Budget  , Trial Duration  (in hours)
Ensure: ,  &gt; 0
1:  = GetNumToDeploy(, )
2: /* Start Trial */
3: Deploy(control = 0, corrupted =  )
4: Wait()
5:  = SaveLogs()
6: CleanUp()</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Method 2 Randomized Control Trial</title>
        <p>Input: Budget  , alpha  , beta  , Number of Stages  ,</p>
        <p>Stage duration  (hours), proportions of interesting
events happening in control  1, proportions of
interesting events happening in corrupted  2
Ensure: , ,  &gt; 0 ; 0.0 &lt; ,  &lt; 1.0
1:  1,  2 = PowerAnalysis( 1,  2, , )
2: for  = 0;  &lt; ;  +=1 do
3: /* Start Stage of Trial:  1 ==  2 */
4: Deploy(control =  1, corrupted =  2)
5: Wait()
6:  = SaveLogs()
7: CleanUp()
8:  total+ =  1 +  2
9: if isEarlyStop(, ,  total) then
10:  = 
11: end if
12: end for</p>
        <p>The vanilla deployment (Method 1) takes a given budget  and the maximum trial duration2 
to determine the maximum number of devices  that can be observed within this study - noted
as GetNumToDeploy( ,  ) on line 1. Then  altered devices are then deployed for observation
during the “trial”; no control devices are present.</p>
        <p>
          In contrast, the RCT (Method 2) and AD (Method 3) account for the risk of error into
selecting how many devices to study using power analysis3. We pass four parameters into
the power analysis equation from HECT [
          <xref ref-type="bibr" rid="ref31">48</xref>
          ]: the probability of committing a Type I error ( ),
the probability of committing a Type II error ( ), and the rate of incidence for the control and
corrupted groups. A Type I error means claiming an efect is present due to the corruption
when it is not true. A Type II error means the study did not collect evidence of an efect when
it is correct. The power of a study is the inverse of the likelihood of committing a Type II error
(1 −  ). The rate of incidence for the control and corrupted groups is initially determined by a
pilot study or educated guess based on related reports. It is an approximation of the rate an
event of interest should be observed within the stage. The power analysis equation returns the
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>2We say in the methods that this is given in hours, but any time duration is works here. 3This calculation can be found in more detail in Appendix A.</title>
        <p>total number  total of devices needed to deploy in each stage. We equally split this value for
RCT and the initial stage of AD. After the first stage of AD, we use the updated rates from the
prior stage to weight the split of  total, adapting the allocation of resources mid-trial.</p>
        <p>Based on the responses seen in the previous stage within an AD trial, trialists can use interim
analysis to make pre-defined changes that will not invalidate or weaken the power of a study.
Our AD updates the next population counts for control and corrupted. To not risk weakening
the power of our study, we apply this update indirectly between stages through the assumed
rates of incidence ( 1 and  2).</p>
        <p>
          From the logs we have a complete view into the events of interest for the study. Our AD
method assumes that each participant will have at most one event of interest before terminating
the system, removing it from the trial. In the case of honeypots, this would be to protect the
honeypot from becoming a launchpad or providing free resources to attackers. To calculate the
likelihood of an event of interest occurring, i.e., the rate of incidence within a group, we use a
Kaplan-Meier (KM) Function, a popular approach for survival analysis within healthcare
applications[
          <xref ref-type="bibr" rid="ref32 ref33">49, 50</xref>
          ].
        </p>
        <p>During a stage, the KM function is updates the likelihood  upon every event of interest
recorded. At  = 0 , all participants are at risk and () = 1 . The remaining time steps update
following this:
where   is the current number of participants at risk and  +1 is the number of participants
that have seen events of interest since  . The diference (   −  +1 ) is not always one. If we know
exactly when all participants see an event of interest, the KM function is calculated when an
infection is recorded. In trials without this ability, a time interval must be set to check with the
participants (e.g., every hour or half-hour) to collect data and see if an event has been recorded.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Adaptive Design for Honeypot Deployments</title>
      <p>
        We demonstrate the capabilities of our AD for intrusion data collection in a sample study using
honeypots. Our method can be applied in other intrusion data collections, but we chose one
to illustrate its specific capabilities. In this study we analyze the risk of an ssh vulnerability
within misconfigured cloud servers. This scenario is based on the BETH dataset used as a pilot
study for our trials [
        <xref ref-type="bibr" rid="ref34">51</xref>
        ]. The dataset contains a variety of attacks via the ssh vulnerability, but
we can only empirically infer how the presence of this vulnerability afects its likelihood of
exploitation. Although the presence of an ssh vulnerability is well known to afect the rate
of exploitation in a server, this study to emphasizes how our method provides evidence on a
corruption’s impact and the benefits of its adaptation.
      </p>
      <p>This study includes Methods 1, 2, and 3 in separate trials, each attempting to collect evidence
of the corruption’s impact. Our budget restricts each trial to a maximum of 200 honeypots
(approximately $650USD) over 12 hours. As recommended in Section 3, this threshold is noted
as the first of our early stopping criteria. Based on the pilot study, we assume an initial rate
of incidence in the control to be 0.01 and in the corrupted to be 0.4. We ensure the study only
considers strong evidence by limiting the chance of error4, setting  = 0.05 and  = 0.10 . The
remaining details for our honeypot studies are summarized in our Study Synopsis (Section 4.1).
We then review the results of the study in Section 4.2.</p>
      <sec id="sec-5-1">
        <title>4.1. Study Synopsis</title>
        <p>
          Following the guidance issued for clinical trials [
          <xref ref-type="bibr" rid="ref35">52</xref>
          ], we summarize the characteristics of our
study below:
Study Duration : The maximum total duration per trial is 12 hours. This is applied across the
three trial methodologies compared in this study
Objectives : (1) Determine if the corruption causes a significant increase in exploitation rate.
(2) Maximize the exploitation rate for honeypots in the U.S. by region in time specified by
the trial or stage.
        </p>
        <p>
          Endpoints : (a) The maximum number of honeypots that can be recruited into the study is 200.
(b) The total number of honeypots allocated in the corruption group is below 10 (indicating
the event of interest is not recorded frequently enough to study in this duration) (c) The
number of honeypots allocated is identical to the last stage of the AD trial, indicating
strong evidence has been collected regarding the current rates of incidence.
Study Population : Cloud-based honeypots monitored with a kernel-level sensor recording all
create, clone, and kill system calls. Each honeypot runs with 1 vCPU, 32GB memory, and
Ubuntu 20.04. There are no additional programs or fake activity and no active connection
between honeypots. They are all hosted by the same large-scale cloud provider within
the U.S. that instantiates identical servers with unique, randomly assigned IP addresses
upon request. The IP address ranges are based on the requested region; our study only
considers four regions within the U.S.: east-1, east-2, west-1, and west-2.
Study Corruption and Control : The corruption is an ssh vulnerability that accepts any
password for four fake user accounts mimicking IT support accounts on industrial
infrastructure: user, administrator, serv, and support. This corruption is an exaggerated
version of a common misconfiguration seen in cloud servers [
          <xref ref-type="bibr" rid="ref2 ref36 ref37">2, 53, 54</xref>
          ]. Control honeypots
host the same user accounts but only accept “password” as the password. We chose the
word “password” as the password based on evidence of attackers scanning for it in cloud
provider networks [25].
        </p>
        <p>Event of Interest : We record an event when a user login is seen in one of the four user
accounts. Because we never login or generate fake calls to login, any user login seen is
considered malicious.</p>
        <p>Measuring Corruption Efect : This study assumes a binary state model to describe each
honeypot as whether an intrusion has or has not occurred. The state of the honeypot is
4It is generally accepted in healthcare settings to set  = 0.05 and power = 80% (meaning  = 0.2 ). We assume a
power of 90% ( = 10% ) because we know there is a large diference between the control and the corrupted.
determined by real-time monitoring of the logs to deal with ethical issues that may arise
from purposefully exposing compute to adversaries. An exploit is assumed to not have
occurred until this event is seen.</p>
        <p>To prevent providing free resources to the attacker or opportunities to launch further attacks,
we terminate the instances upon recording an event of interest. Although this limits the data
we acquire from the study, it satisfies our objectives in recording events of interest. This can be
re-evaluated in alternative studies based on new objectives.</p>
        <p>Each honeypot functions independently with no communication between the honeypots
in the same trial. Their logs are aggregated on a central queuing system within their region
(for their trial) and downloaded before termination. Because we use kernel-level sensors, our
implementation can be easily extended for other objectives and vulnerabilities.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Results</title>
        <p>The total number of honeypots deployed and attacks recorded is shown in Table 2. As expected,
the AD trial deployed the fewest honeypots overall while recording more attacks than RCT. The
AD trial recorded around 36% of the total attacks seen in the vanilla trial, which saw the highest
number of intrusions. This is because the vanilla trial did not deploy any control honeypots,
which had a small rate of incidence. However, by not including a control group, it does not
account for potential bias in the corruption implementation, preventing it from confidently
identifying the causal relationship of the corruption’s efect. Even so, the data collected could
still be used with the rigorous documentation stating the assumptions made in the study. This
(a) RCT by Region
(b) RCT by Stage
enables other researchers to review it independently and determine if the study’s findings are
relevant for their environments.</p>
        <p>From the trials with a control group, it is clear that the corruption causes an increase in
exploitation rate. As can be seen in the trial summary in Table 3, more control honeypots were
exploited than expected in the AD trial, causing a disparity between our initial assumption
( 1 = 1%) and the results from the first stage (marginalized  1 = 15%). This caused our AD
method to reallocate resources, requesting more honeypots to accommodate for error in the
initial assumptions without triggering an endpoint or requiring the study to be reevaluated.
After the second stage of the AD trial concluded with no exploits in the control group, the
control arm was dropped, confirming that the corruption led to more infections.</p>
        <p>We could have ended the AD trial after the first stage (saving 66% of the trial’s budget) if
Endpoint (b) included the control group in the group size minimum. This was not done because
of the second objective to collect intrusion data. Even with the larger request in stage two, the
AD trial requested fewer honeypots than both the vanilla trial and the RCT. The stages within
the trial also limited the time a honeypot was online, preventing adversaries for extended time
to develop a signature for our trap.</p>
        <p>Although there was an exponential rate of exploit across the trials, we noticed signs of
instability in the four-hour stages, where some regions were observed to have diferent exploitation
rates. This was especially apparent by comparing their respective survival curves by the regions
and stages within the RCT trial, shown in Figure 1a and Figure 1b. Around 60-120 minutes into
the stage, the rate of infection in the regions diverges as though us-west-1 was hit first, then
us-east-2, us-west-2, and us-east-1, respectively. From the IP addresses of the hosts, there
is no obvious indication of sequential IP scanning. This instability is another result of this study
so future work can note the impact of smaller time windows. Because this was unanticipated at
the start of this study and not a prior listed early stopping criterion, future work will include it
to provide an opportunity for the trialists to discuss whether the trial should continue.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>Our work is the first to apply an adaptive experimental study in intrusion data collection and
discuss the benefits of collecting counterfactual information with a control group. We provide
general details on running an experimental study with necessary factors to document prior to
conducting the study. Our AD method extends this by optimizing resource allocation based on
events seen at every stage, ensuring the statistical confidence through power analysis based
on updated exploitation likelihoods with the assumption that an event only occurs once per
participant. Because the interventional data collected contains true relations between features
known through experimentation, future statistical models trained with this data are given
higher confidence in learning general trends. This method is especially applicable for security
studies seeking to identify causal relations between a corruption and automated attacks in the
wild.</p>
      <p>We then implemented our method in a honeypot study, confirming that the corruption (an
ssh vulnerability) increased the infection rate of misconfigured cloud servers. This study also
found that while recording more intrusions in observational studies (i.e., in the vanilla trial),
the presence of the control group (as in RCT and AD) enables us to identify the corruption
efect. Our AD shows it is capable of confirming corruption efect than RCT, requiring only
33% of the total trial duration to conclude corruption efect and using 17% fewer honeypots to
see 19% more attacks. Prior to conducting the study, we knew the corruption would increase
infection rate because attackers were provided more options for password entry, including the
control’s “password” for the same user accounts. Had the diference due to the corruption been
less apparent (e.g., in altering multiple points of entry or limiting sequences of vulnerability
exploits), our study would have taken more time and resources to collect evidence.</p>
      <p>
        Future work should consider implementing multiple vulnerabilities to study the interaction
of corruptions. For example, one could add vulnerable applications within the honeypots to
either study the scanning and exploit of multiple existing programs or tracing the sequence of
exploits from the ssh vulnerability to a vulnerable application. This would require introducing
a new methodology that can simultaneously consider multiple treatment arms, such as
REMAPCAP [
        <xref ref-type="bibr" rid="ref28">45</xref>
        ]. By isolating causal relationships, we hope these data can assist in generalizing
solutions, remove some bias in the data, and enable other improvements in the intrusion
detection community.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Thanks to our reviewers and colleagues for their feedback. We also thank the AI for Cyber
Defence leads and causal colleagues at The Alan Turing Institute, whose interest in our work
was great encouragement. Finally, thank you to the CAMLIS organizers for running a fantastic
conference this year, providing a platform for many researchers to provide excited and honest
feedback on our work.
[17] C. van Werkhoven, S. Harbarth, M. Bonten, Adaptive designs in clinical trials in critically
ill patients: principles, advantages and pitfalls, Intensive Care Medicine 45 (2019) 678–682.
[18] N. Stallard, L. Hampson, N. Benda, W. Brannath, T. Burnett, T. Friede, P. K. Kimani,
F. Koenig, J. Krisam, P. Mozgunov, et al., Eficient adaptive designs for clinical trials of
interventions for covid-19, Statistics in Biopharmaceutical Research 12 (2020) 483–497.
[19] P. Pallmann, A. W. Bedding, B. Choodari-Oskooei, M. Dimairo, L. Flight, L. V. Hampson,
J. Holmes, A. P. Mander, L. Odondi, M. R. Sydes, et al., Adaptive designs in clinical trials:
why use them, and how to run and report them, BMC medicine 16 (2018) 1–15.
[20] N. Provos, T. Holz, Virtual honeypots: from botnet tracking to intrusion detection, Pearson</p>
      <p>Education, 2007.
[21] V. Nicomette, M. Kaâniche, E. Alata, M. Herrb, Set-up and deployment of a high-interaction
honeypot: experiment and lessons learned, Journal in computer virology 7 (2011) 143–157.
[22] E. Alata, V. Nicomette, M. Kaâniche, M. Dacier, M. Herrb, Lessons learned from the
deployment of a high-interaction honeypot, in: 2006 Sixth European Dependable Computing
Conference, IEEE, 2006, pp. 39–46.
[23] S. K. Brew, E. Ahene, Threat landscape across multiple cloud service providers using
honeypots as an attack source, in: Frontiers in Cyber Security: 5th International
Conference, FCS 2022, Kumasi, Ghana, December 13–15, 2022, Proceedings, Springer, 2022, pp.
163–179.
[24] S. Machmeier, Honeypot implementation in a cloud environment, arXiv preprint
arXiv:2301.00710 (2023).
[25] C. Kelly, N. Pitropakis, A. Mylonas, S. McKeown, W. J. Buchanan, A comparative analysis
of honeypots on diferent cloud platforms, Sensors 21 (2021) 2433.
[26] S. Peisert, M. Bishop, How to design computer security experiments, in: L. Futcher,
R. Dodge (Eds.), Fifth World Conference on Information Security Education, Springer US,
New York, NY, 2007, pp. 141–148.
[27] O. for Health Improvement, Disparities, Randomised controlled trial:
comparative studies, GOV.UK (2021). URL: https://www.gov.uk/guidance/
randomised-controlled-trial-comparative-studies.
[28] E. Hariton, J. J. Locascio, Randomised controlled trials—the gold standard for efectiveness
research, BJOG: an international journal of obstetrics and gynaecology 125 (2018) 1716.
[29] K. Gupta, J. Attri, A. Singh, H. Kaur, G. Kaur, et al., Basic concepts for sample size
calculation: critical step for any clinical trials!, Saudi journal of anaesthesia 10 (2016) 328.
[30] G. A. Van Norman, Drugs, devices, and the fda: Part 1: An overview of approval
processes for drugs, JACC: Basic to Translational Science 1 (2016) 170–179. URL:
https://www.sciencedirect.com/science/article/pii/S2452302X1600036X. doi:https://doi.
org/10.1016/j.jacbts.2016.03.002.
[31] J. Wason, Improving the eficiency of clinical trials with adaptive
designs, Research Design Service Blog (2019). URL: https://www.rdsblog.org.uk/
improving-the-efficiency-of-clinical-trials-with-adaptive-designs.
[32] B. Franklin, The digital forecast: 40-plus cloud computing stats and trends to
know in 2023, Google Cloud (2023). URL: https://cloud.google.com/blog/transform/
top-cloud-computing-trends-facts-statistics-2023.
[33] L. Martin, M. Hutchens, C. Hawkins, A. Radnov, How much do clinical trails cost?, Nature</p>
    </sec>
    <sec id="sec-8">
      <title>A. Function Definitions</title>
      <sec id="sec-8-1">
        <title>Pseudo code function definitions used in Methods 1, 2, and 3. Shared functions:</title>
        <p>Deploy(control= 1, corrupted= 2): Given the number of devices to observe in each group
of the study ( 1 control devices and  2 corrupted devices), deploy them and record logs from all
devices in a centralized location.</p>
        <p>Wait( ): Wait the length of time specified by  .</p>
        <p>L = SaveLogs(): Fetch logs and return them to be stored in variable</p>
        <p>CleanUp(): Shut down devices and clean up any infrastructure not kept at the end of the
trial or stage.</p>
        <p>GetNumToDeploy( ,  ): Given budget  and the maximum trial duration  , it returns the
maximum number of devices  that can be observed within this study.</p>
        <p>isEarlyStop(L): Given the logs collected from the last stage, determine if the pre-specified
early stopping conditions have been met and the trial must immediately terminate.</p>
        <p>PowerAnalysis( 1,  2, ,  ): See Section A.1</p>
        <p>SurvivalAnalysis(L): See Section A.2</p>
        <sec id="sec-8-1-1">
          <title>A.1. Power Analysis for Sample Size</title>
          <p>
            We denote the trial as robust when it follows a strict calculation of the sample size to be deployed
accounts for type I and type II error in the data. For this paper, we follow the equation from
HECT [
            <xref ref-type="bibr" rid="ref31">48</xref>
            ] which follows power analysis to calculate the sample size:
 total = 2 ∗ ⌈
 total = Total sample size for the study group, which is later split across the diferent treatments
and regions. We alter the equation to calculate two arms of studies as the same since the
split will depend on the ratio found during the trial in interim analysis.
 = critical  value for a given  and  -based subscript
 1 = Control Incidence: The assumed rate of the outcome occurring in the control group is
initially  1 = 1%.
 2 = Treatment Incidence: The assumed rate of an outcome occurring in the corrupted group
based on our pilot study,5 and conservative rounding is  2 = 40%.
 1,  2 = 1 −  1, 1 −  2 (respectively)
 = Type I Error: The probability of claiming an infection rate when it is not true. Our setting
and the generally accepted probability in clinical studies is  = 5% .
 = Type II Error: The probability of not detecting an accurate infection rate when it is correct.
          </p>
          <p>The inverse (1 −  ) is known as the power of the study. In this study,6 we assume a power
of 90% ( = 10% ) because we know there is a large diference between the control and
the corrupted.</p>
          <p>In the medical adaptive design, the KM-provided likelihoods would directly form the new  1
and  2. This would encourage the model to decrease the number of death events seen within
the trial, which would make sense in a healthcare setting. However, we wish to increase the
number of death events seen so we invert the likelihoods from the KM, which we call the risk
rates (RRs). The RRs are marginalised and passed to Equation 2 to get our new    . Unlike
the RCT allocation of honeypots (equally splitting    between control and corrupted), AD
uses the RRs to weight the allocation so regions and corruption assignment with higher RR are
given more honeypots in the following stage. This is repeated at every interim analysis for the
duration of the trial.</p>
        </sec>
        <sec id="sec-8-1-2">
          <title>A.2. Survival Analysis for Updating Risk Rates</title>
          <p>
            We assume in our AD study that each participant will have at most one event of interest before
terminating the system, protecting it from becoming a launchpad or providing free resources
to attackers. The events of interest are parsed out and passed to a survival function which
calculates the likelihood of surviving, i.e., for a system to not see the event of interest. We chose
to use a Kaplan-Meier (KM) Function, a popular approach for survival analysis [
            <xref ref-type="bibr" rid="ref32 ref33">49, 50</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-8-2">
        <title>5Citation removed for anonymity.</title>
        <p>6Generally accepted in clinical studies as  = 80%</p>
        <p>which means  = 20% .
 +1 is the current number of participants at risk
 +1 is the number of participants that have seen events of interest since 
If we have full information on when all participants within the study see an event of interest,
the KM function is calculated when an infection is recorded. In clinical trials without full
information on their patients, a time interval is set to check in with the patient (e.g. every
month or year) to collect data and see if they are alive.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <source>Spamhaus botnet threat update: Q4-2021</source>
          ,
          <string-name>
            <given-names>Spamhaus</given-names>
            <surname>News</surname>
          </string-name>
          (
          <year>2017</year>
          ). URL: https://www.spamhaus.org/news/article/817/spamhaus-botnet
          <source>-threat-update-q4-2021.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Muir</surname>
          </string-name>
          ,
          <article-title>Real-world detection evasion techniques in the cloud</article-title>
          ,
          <year>2022</year>
          . URL: https://www.blackhat.com/eu-22/briefings/schedule/ #
          <article-title>real-world-detection-evasion-techniques-in-the-cloud-29053, black Hat Europe</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Sgaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Charles,</surname>
          </string-name>
          <article-title>The case for causal ai</article-title>
          ,
          <source>Stanford Social Innovation Review</source>
          <volume>18</volume>
          (
          <year>2020</year>
          )
          <fpage>50</fpage>
          -
          <lpage>55</lpage>
          . URL: https://doi.org/10.48558/KT81-SN73. doi:https://doi.org/ 10.48558/KT81- SN73.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Howards</surname>
          </string-name>
          ,
          <article-title>An overview of confounding. part 1: the concept and how to address it</article-title>
          ,
          <source>Acta Obstetricia et Gynecologica Scandinavica</source>
          <volume>97</volume>
          (
          <year>2018</year>
          )
          <fpage>394</fpage>
          -
          <lpage>399</lpage>
          . URL: https: //obgyn.onlinelibrary.wiley.com/doi/abs/10.1111/aogs.13295. doi:https://doi.org/10. 1111/aogs.13295.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Labs</surname>
          </string-name>
          ,
          <year>1998</year>
          <article-title>darpa intrusion detection evaluation dataset, 1998</article-title>
          . URL: https://www.ll. mit.edu/r-d/datasets/1998-darpa
          <article-title>-intrusion-detection-evaluation-dataset.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tavallaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bagheri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <article-title>A detailed analysis of the kdd cup 99 data set, in: 2009 IEEE symposium on computational intelligence for security and defense applications</article-title>
          , IEEE,
          <year>2009</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shiravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shiravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tavallaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <article-title>Toward developing a systematic approach to generate benchmark datasets for intrusion detection</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>31</volume>
          (
          <year>2012</year>
          )
          <fpage>357</fpage>
          -
          <lpage>374</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0167404811001672. doi:https://doi.org/10.1016/j.cose.
          <year>2011</year>
          .
          <volume>12</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>M. J. M. Turcotte</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <string-name>
            <surname>Kent</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hash</surname>
          </string-name>
          ,
          <article-title>Unified Host and Network Data Set</article-title>
          , World Scientific,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          . URL: https://www.worldscientific.com/doi/abs/10.1142/9781786345646_
          <fpage>001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Hamman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mewhirter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Harknett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vićić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>White</surname>
          </string-name>
          , Deciphering cyber operations,
          <source>The Cyber Defense Review</source>
          <volume>5</volume>
          (
          <year>2020</year>
          )
          <fpage>135</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Spillard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Collyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dhir</surname>
          </string-name>
          ,
          <article-title>Developing optimal causal cyber-defence agents via cyber security simulation</article-title>
          ,
          <source>arXiv preprint arXiv:2207.12355</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dumitras</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Neamtiu</surname>
          </string-name>
          ,
          <article-title>Experimental challenges in cyber security: a story of provenance and lineage for malware</article-title>
          ,
          <source>in: 4th Workshop on Cyber Security Experimentation and Test (CSET 11)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Howards</surname>
          </string-name>
          ,
          <article-title>An overview of confounding. part 2: how to identify it and special situations</article-title>
          ,
          <source>Acta Obstetricia et Gynecologica Scandinavica</source>
          <volume>97</volume>
          (
          <year>2018</year>
          )
          <fpage>400</fpage>
          -
          <lpage>406</lpage>
          . URL: https: //obgyn.onlinelibrary.wiley.com/doi/abs/10.1111/aogs.13293. doi:https://doi.org/10. 1111/aogs.13293.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dhir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hoeltgebaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Briers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Prospective artificial intelligence approaches for active cyber defence</article-title>
          ,
          <source>arXiv preprint arXiv:2104.09981</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          ,
          <article-title>Evolution of clinical research: a history before and beyond james lind</article-title>
          ,
          <source>Perspectives in clinical research 1</source>
          (
          <year>2010</year>
          )
          <article-title>6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mackenzie</surname>
          </string-name>
          ,
          <article-title>The book of why: the new science of cause and efect</article-title>
          , Basic books,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Houle</surname>
          </string-name>
          ,
          <article-title>An introduction to the fundamentals of randomized controlled trials in pharmacy research</article-title>
          ,
          <source>The Canadian journal of hospital pharmacy 68</source>
          (
          <year>2015</year>
          )
          <fpage>28</fpage>
          . Reviews Drug Discovery
          <volume>16</volume>
          (
          <year>2017</year>
          ). URL: https://doi.org/10.1038/nrd.
          <year>2017</year>
          .
          <volume>70</volume>
          . doi:https: //doi.org/10.1038/nrd.
          <year>2017</year>
          .
          <volume>70</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>C.</given-names>
            <surname>Simoiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bonneau</surname>
          </string-name>
          , S. Goel, “
          <article-title>i was told to buy a software or lose my computer. i ignored it”: A study of ransomware</article-title>
          ,
          <source>in: Proceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS</source>
          <year>2019</year>
          ),
          <year>2019</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Capecci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Ellis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dommaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Ebner</surname>
          </string-name>
          ,
          <article-title>Susceptibility to spear-phishing emails: Efects of internet user demographics and email content, ACM Transactions on Computer-Human Interaction (TOCHI) 26 (</article-title>
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Somayaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Inoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ford</surname>
          </string-name>
          ,
          <article-title>Evaluating security products with clinical trials</article-title>
          .,
          <source>in: CSET</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Lévesque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chiasson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somayaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <article-title>Technological and human factors of malware attacks: A computer security clinical trial approach</article-title>
          ,
          <source>ACM Transactions on Privacy and Security (TOPS) 21</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>F.</given-names>
            <surname>Lalonde Levesque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nsiempba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chiasson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somayaji</surname>
          </string-name>
          ,
          <article-title>A clinical study of risk factors related to malware infections</article-title>
          ,
          <source>in: Proceedings of the 2013 ACM SIGSAC Conference on Computer &amp; Communications Security, CCS '13</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2013</year>
          , p.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          . URL: https://doi.org/10.1145/ 2508859.2516747. doi:
          <volume>10</volume>
          .1145/2508859.2516747.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [39]
          <string-name>
            <surname>T.-F. Yen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Heorhiadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Oprea</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Juels</surname>
          </string-name>
          ,
          <article-title>An epidemiological study of malware encounters in a large enterprise</article-title>
          ,
          <source>in: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1117</fpage>
          -
          <lpage>1130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bošnjak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Brumen</surname>
          </string-name>
          ,
          <article-title>Shoulder surfing experiments: A systematic literature review</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>99</volume>
          (
          <year>2020</year>
          )
          <article-title>102023</article-title>
          . URL: https://www.sciencedirect.com/science/ article/pii/S0167404820302960. doi:https://doi.org/10.1016/j.cose.
          <year>2020</year>
          .
          <volume>102023</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Barabási</surname>
          </string-name>
          ,
          <article-title>A genetic epidemiology approach to cyber-security</article-title>
          ,
          <source>Scientific reports 4</source>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Bour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Carter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Chipman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Everett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Heussen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          , R.-D. Hilgers,
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Renteria</surname>
          </string-name>
          , et al.,
          <article-title>A roadmap to using randomization in clinical trials</article-title>
          ,
          <source>BMC Medical Research Methodology</source>
          <volume>21</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Pignon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arriagada</surname>
          </string-name>
          ,
          <article-title>Early stopping rules and long-term follow-up in phase iii trials</article-title>
          ,
          <source>Lung Cancer</source>
          <volume>10</volume>
          (
          <year>1994</year>
          )
          <fpage>S151</fpage>
          -
          <lpage>S159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <article-title>Interim analysis: a rational approach of decision making in clinical trial</article-title>
          ,
          <source>Journal of advanced pharmaceutical technology &amp; research 7</source>
          (
          <year>2016</year>
          )
          <fpage>118</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Angus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Berry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Al-Beidh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Arabi</surname>
          </string-name>
          , W. van
          <string-name>
            <surname>Bentum-Puijk</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Bhimani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bonten</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Broglio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Brunkhorst</surname>
          </string-name>
          , et al.,
          <article-title>The remap-cap (randomized embedded multifactorial adaptive platform for community-acquired pneumonia) study. rationale and design</article-title>
          ,
          <source>Annals of the American Thoracic Society</source>
          <volume>17</volume>
          (
          <year>2020</year>
          )
          <fpage>879</fpage>
          -
          <lpage>891</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Sydes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Mason</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. W.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Amos</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Anderson</surname>
            , J. de Bono,
            <given-names>D. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dearnaley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Dwyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Green</surname>
          </string-name>
          , et al.,
          <article-title>Flexible trial design in practice-stopping arms for lack-of-benefit and adding research arms mid-trial in stampede: a multi-arm multi-stage randomized controlled trial</article-title>
          ,
          <source>Trials</source>
          <volume>13</volume>
          (
          <year>2012</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Response-adaptive randomization for clinical trials with adjustment for covariate imbalance</article-title>
          ,
          <source>Statistics in medicine 29</source>
          (
          <year>2010</year>
          )
          <fpage>1761</fpage>
          -
          <lpage>1768</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>K.</given-names>
            <surname>Thorlund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Golchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Haggstrom</surname>
          </string-name>
          , E. Mills,
          <article-title>Highly eficient clinical trials simulator (hect): Software application for planning and simulating platform adaptive trials</article-title>
          ,
          <source>Gates Open Research</source>
          <volume>3</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          , Time to event variables,
          <source>Survival Analysis (Date Accessed: 13 January</source>
          <year>2023</year>
          ). URL: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_survival/BS704_Survival_ print.html.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>D.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Enns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koulecar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sihag</surname>
          </string-name>
          ,
          <article-title>Two approaches to survival analysis of open source python projects</article-title>
          ,
          <source>arXiv preprint arXiv:2203.08320</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>K.</given-names>
            <surname>Highnam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Arulkumaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hanif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Jennings</surname>
          </string-name>
          ,
          <article-title>Beth dataset: Real cybersecurity data for unsupervised anomaly detection research</article-title>
          ,
          <source>The Conference on Applied Machine Learning in Information Security (CAMLIS)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [52]
          <article-title>National Institutes of Health (NIH), NIH and FDA Release Protocol Template for Phase 2 and 3 IND/IDE Clinical Trials</article-title>
          ,
          <source>Notice Number: NOT-OD-17-064</source>
          ,
          <year>2017</year>
          . URL: https: //grants.nih.gov/grants/guide/notice-files/NOT-OD-
          <volume>17</volume>
          -064.html.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wagener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>State</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dulaunoy</surname>
          </string-name>
          ,
          <article-title>Adaptive and self-configurable honeypots, in: 12th IFIP/IEEE international symposium on integrated network management (IM 2011) and workshops</article-title>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>345</fpage>
          -
          <lpage>352</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Abdulsalam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hedabou</surname>
          </string-name>
          ,
          <article-title>Security and privacy in cloud computing: Technical review</article-title>
          ,
          <source>Future Internet</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <article-title>11</article-title>
          . URL: http://dx.doi.org/10.3390/fi14010011. doi:
          <volume>10</volume>
          . 3390/fi14010011.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>