=Paper=
{{Paper
|id=Vol-3056/paper-11
|storemode=property
|title=Cheat Detection in Cyber Security Capture The Flag Games - An Automated Cyber Threat Hunting Approach
|pdfUrl=https://ceur-ws.org/Vol-3056/paper-11.pdf
|volume=Vol-3056
|authors=Robert CHETWYN,Laszlo ERDODI
}}
==Cheat Detection in Cyber Security Capture The Flag Games - An Automated Cyber Threat Hunting Approach==
<pdf width="1500px">https://ceur-ws.org/Vol-3056/paper-11.pdf</pdf>
<pre>
Cheat Detection In Cyber Security Capture The Flag
Games - An Automated Cyber Threat Hunting
Approach
Robert A. Chetwyn1 , László Erdődi2
1
University of Oslo, Department of Informatics, Gaustadalléen 23B, 0373 Oslo, Norway
2
 Norwegian University of Science and Technology, Department of Information Security and Communication Technology,
Gløshaugen, 7034 Trondheim, Norway


                                         Abstract
                                         Capture-the-flag style cyber security games (CTF) are one of the most popular ways of learning and
                                         teaching ethical hacking. These CTF games usually present a set of hacking tasks or challenges that
                                         simulate a vulnerability to be compromised. When the participant compromises the vulnerability they
                                         are presented with a secret flag that is uploaded to prove a participants completion of a challenge. Whilst
                                         this secret flag confirms successful completion of a challenge, it does little to verify the legitimacy of a
                                         participant’s activities. We propose a process for plagiarism detection in web application CTF games via
                                         automated cyber threat hunting techniques. Using log data captured from penetration testing courses, we
                                         develop a series of indicators of compromise for each CTF challenge that are attributed to a participant’s
                                         activities. We develop an automated querying tool that interfaces with the Elastic Stack to query these
                                         IOCs for classifying participant activities as suspicious or benign without false positives.

                                         Keywords
                                         Security automation, threat hunting, security education, plagiarism, penetration testing


1. Introduction
Capture-the-flag style cyber security games (CTF) are one of the most popular ways of learning
and teaching ethical hacking. These CTF games usually present a set of hacking tasks or
challenges that simulate a vulnerability to be compromised such as in [1] [2] [3] and our own
CTF platform: Hacking Arena [4]. When the participant compromises the vulnerability they are
usually presented with a secret flag that is used to prove a participant’s successful completion
of a challenge. Whilst this secret flag confirms successful completion of a challenge, it does
little to verify the legitimacy of this compromise.
   This lack of verification is a problematic scenario in both academic and industry environments,
where plagiarism affects the integrity of the provided courses and participants certification. In
2019 plagiarism was reported for the Offensive Security Certified Professional (OSCP) exam
where an ex-participant produced public write-ups on the OSCP exam challenges, leaking
the exam solutions [5]. These compromised challenges were still reported present in later

C&ESAR’21: Computer Electronics Security Application Rendezvous, November 16-17, 2021, Rennes, France
Envelope-Open roberac@ifi.uio.no (R. A. Chetwyn); laszlo.erdodi@ntnu.no (L. Erdődi)
Orcid 0000-0002-2028-849X (R. A. Chetwyn); 0000-0002-4910-4228 (L. Erdődi)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


Proceedings of the 28th C&ESAR (2021)                                                                                                                                            175
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


examinations after the leak, consequently leading to possible reuse of challenge solutions.
Similarly, examination ’brain-dumps’ - the publishing of exam questions, topics and answers [6]
create a problem with information reuse. Participants can reuse the information provided from
brain-dumps to complete CTF challenges, skipping pre-requisite steps and submitting the flags.
   With the popularity of using CTF challenges for delivering cyber security training we are
motivated to ensure the integrity of this delivery and to monitor each challenge for plagiarism.
This paper provides an automated monitoring solution for web based CTF challenges based
upon cyber threat hunting that is capable of detecting plagiarism activities, with high precision.
   The paper is structured as follows: Section 2 provides background information into CTF
challenges, an overview of the Hacking Arena and the proposed threat hunting methods. Section
3 discusses related research. Section 4 explains the infrastructure and methods used to conduct
the research. Section 5 presents the findings and Section 6 discusses these findings in depth.


2. Background
CTF competitions usually present a set of hacking tasks or challenges where each challenge is
defined by one vulnerability or a chain of vulnerabilities associated with a secret flag. The aim
of a participant is to exploit the vulnerability in each challenge, and thus capture the associated
flag. Once captured, the participant submits this flag as confirmation of CTF completion. Unlike
in real hacking scenarios no further steps are required from the attacker after exploiting the
vulnerability such as maintaining continuous access to the target, uploading attacking scripts,
or establishing a connection to a command and control server.
   Based on the secret flag an unambiguous criterion is provided for each challenge to decide
whether a challenge was solved or not. Challenges can be classified according to the type
of problem they present (e.g., web hacking challenge or binary exploitation). Unlike in real
security incidents the human factors are excluded from the solution unless other information is
provided for the challenge, so an attacker has to rely on their knowledge and reasoning, but not
on social engineering. To find the solution of the hacking challenge the participants have to
carry out attack steps in the right order. By solving a step the participants might receive new
information to achieve the next step. Additional information can come from challenge hints
(e.g. what is worth trying) too provided for the challenge to help the attacker to proceed in the
right direction.
   Standard CTFs run in Jeopardy mode, meaning that all the participants are attackers, and all
the challenges are static, so challenges are not changing throughout the competition. In other
CTF variants, participants may be subdivided to two teams:a red team, focusing on attacking
the target system, and a blue team for defending the target system. Alternatively, each team
may be provided with an infrastructure they have to protect - they can change it to strengthen
the service - while, at the same time, attacking the infrastructure of other teams. Considering
Jeopardy style challenges the steps of the solutions are always the same. Trading with the flags
with other teams to achieve better position in the competition is a relevant risk in all CTF games.
To prevent and deter plagiarism the CTF game provider has to monitor the solution steps to
exclude teams with unrealistic solutions. Many CTF competitions have high prize rewards
where great results can have professional benefit. To detect all plagiarism in such CTF games is


176                                                        Proceedings of the 28th C&ESAR (2021)
                                                                             R. Chetwyn and L. Erdodi


essential nowadays.
   Since each challenge is broken down into a series of attack steps, the challenge step depen-
dencies can be transformed into indicators of compromise (IOC). These IOCs are artefacts of
forensic evidence that are matched to logged events from the participant’s interactions with the
CTF challenges. Bianco (2014), [7] present the ’Pyramid of Pain’ (PoP), as seen in Figure 1 that
categorises types of IOCs that can be linked to a participant’s activities. An example of how the
PoP can be utilised with our challenge scenarios is the following: a CTF challenge step requires
a user to submit a string of >31 characters to produce an error. The logged POST parameters are
observable Network/Host Artefacts. These artefacts can then be matched other evidence such
as IP Addresses, unique session IDs and web user-agents that compromised the CTF challenges
to determine if the actions of a participant were legitimate or plagiarised. If a user compromises
the challenge without fulfilling the steps then illegitimate activity has taken place.


Figure 1: Pyramid of Pain - IOC types that can be used to detect a participant’s activities [7]


  The Hacking Arena environment hosted at the University of Oslo [4] is utilised to teach
ethical hacking modules through a variety of web-based CTF style challenges. For this research
we utilise HTTP logs gathered from two taught ethical hacking modules via the Hacking Arena’s
CTF challenges to aid and develop our research.


3. Related
A previous study on cheat detection in CTF challenges can be found in the work conducted by
Kakouros (2020) [8]. This research utilises an inference engine and ad-hoc CTF infrastructure to
capture and monitor the actions of players within challenges. However, this research is limited
in that it does not take into account the sequence of events that took place, only matching steps
independently and can therefore be manipulated by the user. Similarly, Kakouros’ [8] approach
is limited in its attribution of events to actors.


Proceedings of the 28th C&ESAR (2021)                                                             177
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


   Previous studies have found that the utilisation of threat hunting methods with inference
engines is effective at tracing the lifecycle of a threat actor’s actions [9] [10] [11]. Al-Mohannadi
et al. (2020) [9] utilise cyber threat hunting techniques with ELK [12] stack to analyse honeypot
logs. Through keyword searching and visualisation tools provided by ELK, they were able
to identify attack events from benign events, breakdown these attack events into various
subcategories for further analysis. This research provides an insight into the effectiveness
of inference engines like ELK stack for cyber threat hunting, however it is an analysis of all
the honeypot log data rather than the attribution of events to an actor in a CTF environment.
Similarly, relying on a manual iterative process to analyse the log data.
   Al Shibana & Anupriya (2019) [10] propose an automated approach to threat hunting with
inference engines. This research creates a series of detection rules tailored to indicators of
compromise captured by Sysmon that are then indexed by the inference engine. When the
forwarded events match the detection rule, the analyst is alerted. Whilst this approach is
effective as an intrusion detection system, our research works inversely; we know how the
challenge is to be compromised and must analyse past events and attribute these events to the
participants who compromised the challenge. Similarly, [11] utilise indicators of compromise
and an inference engine for assessing and classifying threat levels. This research however
utilises Sysmon logs in windows clients rather than our scope that is focused on the logging of
HTTP interactions gathered from CTF challenge clients. This presents opportunity to explore
new methods of detection through the utilisation of a threat hunting methodology that translates
a series of challenge dependencies into indicators of compromise (IOC), unique event signatures
that are queriable by an inference engine. Whilst Daszczyszak et al. (2019) [13] indicate that
IOCs are sensitive to polymorphism and metamorphism our web hacking challenges are static
with expected attack patterns that a participant must fulfil. Therefore focusing on these IOCs is
not problematic within our scenario.


4. Methods
Each web hacking challenge has a series of predefined dependencies that a participant should
fulfill to acquire the challenge flag. This is because each step in the challenge provides further
information or interaction to the participant to complete the challenge. By fulfilling each step
within the challenge dependency, the detection system can determine the user as benign or,
when steps are missing or out of order, the participant is deemed suspicious.
   Every step of a challenge dependency is treated as an indicator of compromise (IOC) with
unique elements that define them. This allows the detection system to query these specific
elements, match the actions of participants to these IOCs across the entire index of captured
data and analyse the series of steps for suspicious activities. Furthermore, we are the author’s
of the CTF challenges and therefore know what is required of a participant to get from one step
to another. Because of this we can manually generate the IOCs that are to be searched for in the
indexed security logs by the CTF querier [14]. Where steps can be fulfilled in multiple ways, it
is possible to generate multiple IOCs for a challenge step and have the automated system query
the series of possible IOCs to determine if a participant has fulfilled the challenge step.
   Example steps can be found in Figure 2 for a brute web hacking challenge that requires a


178                                                         Proceedings of the 28th C&ESAR (2021)
                                                                           R. Chetwyn and L. Erdodi


bruteforce attack to be conducted to acquire the challenge flag. To fulfill the dependencies the
participant must first make a request to the web page to view its contents, interact with the
web login form that is present on the site, conduct the bruteforcing attack, login to the site
using the bruteforced credentials, acquire the flag. If a participant requests the flag file without
conducting the brute force attack then this is suspicious activity.


Figure 2: Example simplified steps required to fulfill a challenge dependency related to a web brute-
forcing attack


4.1. Infrastructure
To be able to process and detect suspicious activities of participants within the web hacking
challenges, the detection system requires access to the challenge logs.
  To facilitate this, the infrastructure in Figure 3 has been developed to acquire the logs from
individual web hacking challenges, process these logs into a universal format and then stores
these logs in a centralised location for querying.


Figure 3: Infrastructure design of detection system


   To analyse each step and determine if an participant has fulfilled the dependencies, a threat
hunting approach is applied. Each step can be transformed into an indicator of compromise
(IOC) that is unique to each challenge.


Proceedings of the 28th C&ESAR (2021)                                                           179
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


   Logging Agent: Each web hacking challenge has an agent that logs all HTTP interactions.
These logs include all IP addresses, the timestamp of individual interactions, GET and POST
requests, cookies and web user agents. Similarly, all Apache web server interactions are
logged for each challenge. These logs are then analysed to match challenge dependencies to
participant’s activities. Focusing only on HTTP is sufficient enough as the current challenges
are only web-based, where the challenges can only be interacted with via HTTP requests.
   Centralised Logging Directory: To avoid overloading the infrastructure that facilitates the
web hacking challenges, these logs are stored in a centralised directory located on a separate
logging server. This allows for acquisition and processing of all logs from a single source point
without the need to communicate with the web challenge agents. This solution is scalable as
the only dependency is the capacity of the centralised system’s storage space, where less or
more storage space dictates the amount of challenges that can be logged.
   Log Preprocessor: The log preprocessor then acquires all logs from the centralised logging
directory. If the logs are not Apache access or Apache error logs then further preprocessing is
required, otherwise the elastic Filebeat agent processes and parses the logs to the Elasticsearch
Filebeat index. The preprocessing stage for non Apache logs converts the logs into a .csv
formatted data set and applies additional logic to the logs dependent on the web hacking
challenge category. Furthermore, these data sets can also be used for future research.
   An example of this additional logic is the following scenario: The web hacking challenge
requires a user to exploit a vulnerability in its search parameter. This vulnerability exposes
a directory path when the user submits a search string of more than 20 characters. The
preprocessing logic gets the length of each submitted search parameter and adds this to the
dataset. Since this data isn’t relevant to a challenge configured for SQL injection it is only
processed for challenges that require such a vulnerability to be utilised.
   Elasticsearch: After preprocessing, the data is output to Elasticsearch either as a Filebeat
index containing all Apache access and error logs or as an index containing the custom CTF
logs of a specific web hacking challenge.
   Querying: The Elasticsearch indices can then be queried either through the automated CTF
querier to get a holistic view of an participant’s actions or by using the ’Discover’ module in
Kibana for search queries. The query parameters are defined by the web hacking challenge
dependencies and look for matching signatures in the HTTP traffic. Limitations with Discover
were found when querying multiple indices with different formatting types. Due to the content
of the Apache logs and the Filebeat agent not being the same format, contents were missing
when returning search results. This creates inefficiencies when using the Discover module for
searching and detecting suspicious activities. Applying a field alias could be assigned to the field
names in indexes as a workaround but this only returns the data of those field aliases. Therefore
we have developed an automated CTF querying tool for interacting with Elasticsearch indices.

4.2. Logging
Each web hacking challenge is logged using two logging types. The first type is the Apache
access and error logs that are generated by the Apache Web Server contained on each system.
Access logs are formatted using the Apache combined log format to include the following
elements:


180                                                         Proceedings of the 28th C&ESAR (2021)
                                                                       R. Chetwyn and L. Erdodi


    • IP Address
    • Timestamp
    • HTTP request method (GET/POST)
    • Status code
    • Return byte size
    • Referrer
    • Web user agent
  Error logs are formatted using the Apache default formatting and contain any errors re-
turned by the system. These may include requested documents unavailable, PHP errors, access
revocation.
  The final logging type is a custom CTF logging format that is used for all web hacking
challenges. This logging format contains the following elements:
    • Timestamp
    • IP address
    • Challenge name
    • Requested page
    • HTTP GET content
    • HTTP POST content
    • Site cookies
    • Web user agent
    • Unique ID
  By collecting and indexing these log elements they can be queried for indicators of compromise
where the elements can be attributed to a participant fulfilling challenge dependencies.

4.3. CTF Querier
The CTF Querier is an automated tool that leverages the Python Elasticsearch Client [15] for
querying the Elasticsearch indices. These queries are the IOCs that are generated from the
challenge dependencies.
  Since the CTF Querier is designed to work inversely, focusing on the steps conducted before
the captured flag was obtained, the final flag query is used to obtain participants who compro-
mised the web challenge and narrow the scope of the queries. This not only saves time and
increases detection rates but ensures that the system only analyses the actions of subjects that
compromised the web challenges. Once the system has obtained a list of participants that match
the IOC of the final challenge step it conducts the following:
   1. Get timestamp of final flag IOC for each participant.
   2. For each step in a challenge dependency gather the following:
        a) Get all participants who match the IOC
        b) Get initial timestamp of IOC match for each participant
   3. Check the fulfilment of challenge dependencies for each participant by checking the
      following:


Proceedings of the 28th C&ESAR (2021)                                                       181
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


         a) Did the participant complete the step or is a match missing? If missing then define
            the lack of action as suspicious.
               i. If so define the action as suspicious
              ii. Update suspicious list with participant details
        b) Did the challenge step occur before a previous step?
               i. If so define the action as suspicious
              ii. Update suspicious list with participant details
   4. Create a report for the analyst that indicates the number of times an participant appeared
      in the suspicious list and the suspicious actions that are attributed to that participant
  The following example provides a subset of IOCs for a CTF challenge which can be queried
by the CTF querier. These logged HTTP requests indicate if a participant has fulfilled a step in
the challenge dependency.

   1. ”match_phrase”: {”query”: ”*audi”}
   2. ”match_phrase”: {”query”: ”*\\/etc/passwd*”}
   3. ”match_phrase”: {”url.original”:
      ”/index.php?car=php://filter/convert.base64-encode/resource=index.php”}
   4. ”match_phrase”: {”url.original”: ”/loginforusers/index.php”}
   5. ”match_phrase”: {”post”: ”POST: car=’ or position()=3]/*[5]|a[’;\”}

4.4. Testing of CTF Querier
For testing of the CTF Querier’s accuracy, several web hacking challenges were chosen with two
simulated participant interactions with these challenges. The chosen web hacking challenges
are of different challenge types to ensure the CTF querier isn’t suitable for only one challenge
type.
  The two participants always follow the following approaches:

      • Benign Actor: This participant fulfills all challenge dependencies.
      • Malicious Actor: For each challenge type the malicious participant does not fulfill
        challenge dependencies, deliberately misses challenge steps or completes steps out of
        order.

   As the detection system predicts the participant’s set of actions as either benign or suspicious
based upon matching the actions to IOCs, the following confusion matrix in Figure 4 can be used
to quantify the performance of the detection system and compare the true positive (TP), False
Positive (FP), True Negative (TN) and False Negative (FN) generated by the system. When the
prediction for suspicious activities is correct this activity can then be matched to a TP. Similarly,
correct predictions of benign activity is a TN. Incorrect predictions of suspicious activity are FP
and incorrect predictions of benign activity are FN.
   Precision and recall are derivatives of the confusion matrix used to statistically analyse the
performance of the cheat detection system. Precision is the proportion of correctly classified
positive predictions that belong to the positive class. An important metric for evaluation due to
the impact of falsely reporting plagiarism for participants.


182                                                         Proceedings of the 28th C&ESAR (2021)
                                                                        R. Chetwyn and L. Erdodi


Figure 4: Confusion Matrix for cheat detection system prediction outcomes


                                                     𝑇𝑃
                                      𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
                                                  𝑇𝑃 + 𝐹𝑃
  Recall is the proportion of events positively predicted within the whole dataset. Similarly to
precision, there is a high impact to reporting plagiarism activities as false negatives.
                                                     𝑇𝑃
                                        𝑅𝑒𝑐𝑎𝑙𝑙 =
                                                   𝑇𝑃 + 𝐹𝑁

5. Results
To assess the performance of the detection system, four differing web hacking challenges were
chosen for testing. Each challenge has a different series of challenge dependencies and overall
goal for the participant to fulfill. The actions of each participant use the previously discussed
benign and malicious participant approaches; where the benign participant conducts the series
of steps in order whilst the malicious participant avoids steps.
   The main purpose of the assessment is to determine the effectiveness in the detection system’s
ability to query the challenge dependency IOCs and define the suspicious or benign activities
that are attributed to a participant.
   For testing, the benign and malicious participant were simulated by the analysts. All results
from these tests have been anonymised to preserve privacy.
   The first web hacking challenge used for testing is an information disclosure challenge that
requires the participant to submit a parameter string disclosed to the web server to retrieve the
flag. The challenge dependencies state the following order of five actions:

   1. The participant requests the index page of the challenge


Proceedings of the 28th C&ESAR (2021)                                                        183
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


   2. The participant requests and analyses the robots.txt, containing a list of directories to be
      excluded from web crawlers.
   3. The participant requests the /Something/ directory.
   4. The participant requests the /PennyLane/ directory.
   5. The participant submits the web form with the string ’Hello’ in the ’greeting’ parameter.
      This returns the flag.

  Our simulated set of actions for the benign user fulfill all of the challenge dependencies in
order. However, these are the following set of actions for the malicious participant:
   1. The participant skips all stages up to step 4. Actor fulfills step 4 by requesting the
      /PennyLane/ directory.
   2. The participant fulfils step 5; submitting ’Hello’ within the ’greeting’ parameter.

   Using the queries generated from the challenge dependencies for the parameter tampering
challenge, the CTF querier analysed and defined all participant action’s with 100% precision,
100% recall, 100% NPV and a 0% false positive rate, as seen in the Figure 6. confusion matrix.
   Figure 5 provides an example output of the summarised events generated by the CTF querier.
In this example, the results are obtained from the captured data related to the information
disclosure challenge.


Figure 5: Summary of classified set of simulated actions for information disclosure web hacking
challenge


   The next challenge for testing is a local file inclusion attack. The participant is required to
inspect the source code of a server side script file by exploiting a local file inclusion vulnerability.
The server side source code exposes a hidden site that is vulnerable to xpath injection attacks.
By exploiting the xpath injection vulnerability the flag contained within an xml file is exposed.
The dependencies of this challenge are as follows:
   1. The participant requests the index page of the web challenge.
   2. The participant analyses the website and interacts with the ’car’ parameter that returns a
      .txt file prefixed with the brand of car the user inputs.
   3. The participant tries out local file inclusion on the car parameter by requesting: car=/etc/passwd/
   4. The participant tries to base64 encode the source file of the index.php page using the follow-
      ing string in the ’car’ parameter: php://filter/convert.base64-encode/resource=index.php
   5. Once the participant has decoded the base64 encoded source code they will see a reference
      to a /loginforusers/” directory. We expect the participant to request this directory.
   6. The participant needs to conduct xpath injection on the login form to expose the flag
      using the following string: ’ or position()=3]/*[5]|a[’.


184                                                           Proceedings of the 28th C&ESAR (2021)
                                                                           R. Chetwyn and L. Erdodi


Figure 6: Confusion Matrix for cheat detection system prediction for information disclosure simulated
test


         a) Note: Whilst the participant needs to map the size of the XML document using
            the correct position numbers, if they use a randomised list of numbers to try it is
            probable they are successful on the first attempt.

  As before, the benign user conducts all steps in the challenge dependency. The set of actions
for the malicious user are as follows:

   1. Malicious participant requests the web challenge index page.
   2. Malicious participant skips the local file inclusion steps. Skipping directly to step 5.
   3. Malicious participant requests the ’/loginforusers/’ directory.
   4. Malicious participant submits the correct xpath injection string to produce the challenge
      flag.

   Using the queries generated from the challenge dependencies for the parameter tampering
challenge, the CTF querier analysed and defined all participant action’s with 100% precision,
100% recall, 100% NPV and a 0% false positive rate, as seen in the Figure 7. confusion matrix.
   The third challenge used for the simulated set of actions requires the participant to post a
string size greater than 31 characters to disclose a hidden directory in an error message. The set
of actions are the following: the benign participant fulfills all challenge dependencies whilst the
malicious participant is provided the flag URI by a friend, thus skipping all steps. The following
confusion matrix in Figure 8 summarises the classified actions by the CTF querier.
   The final challenged used to analyse the simulated set of actions requires the participant to
tamper with a numerical query parameter until the correct parameter number is identified. The
set of actions for this challenge is the following: the benign participant fulfills all challenge


Proceedings of the 28th C&ESAR (2021)                                                            185
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


Figure 7: Confusion Matrix for cheat detection system prediction for file inclusion simulated test


dependencies, the malicious participant only requests the index page and submits the correct
numerical parameter on first attempt. The confusion matrix in Figure 9 summarises the classified
actions by the CTF querier.


Figure 8: Confusion Matrix for cheat detection system prediction for third simulated scenario


  The following confusion matrix in Figure 10 contains the values of all action classifications
by the CTF querier for all challenges. The accuracy, precision and recall and NPV are all 100%


186                                                          Proceedings of the 28th C&ESAR (2021)
                                                                           R. Chetwyn and L. Erdodi


Figure 9: Confusion Matrix for cheat detection system prediction for final simulated test


with a 0% false positive rate when classifying the actions as either benign or malicious.


Figure 10: Final confusion matrix for all challenge predictions determined by the CTF Querier


6. Discussion
Our research goal is to explore the detection of plagiarism in CTF games. The CTF querier
achieves this goal through the automated CTF querier. The application of a threat hunting


Proceedings of the 28th C&ESAR (2021)                                                           187
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


method for generating signature based queries for challenge dependencies greatly increases the
CTF querier’s accuracy and in turn reduces the false positive rate when classifying a participant’s
actions as benign or suspicious.
   Analysis of the final confusion matrix shows that the CTF querier is able to accurately perform
multi-phase event detection and classification of captured web traffic for user-defined web
hacking challenges. The CTF querier is capable of analysing vast quantities of indexed HTTP
log data and correctly classify malicious and benign events and attribute those events to a
participant with no false positives. Furthermore. by utilising signature based threat hunting
methods to find IOCs pertaining to challenge dependencies, the CTF querier is not prone
to false positives compared to similar research conducted by [8]. Kakouros’ [8] approach to
cheat detection in ethical hacking was prone to false positives in their preliminary testing due
to the infrastructure failing to log events. This resulted in a sensitivity rate of 67% and an
accuracy of 75% compared to the CTF querier that had no false positives and correctly classified
each malicious and benign event. After Kakouros [8] reconfigured their infrastructure the
accuracy was improved to 91%. There were no instances of events failing to log within the CTF
querier infrastructure which further aids the confidence of our implementation. Our current
implementation is limited to event attribution to individual participants and can not attribute
group based plagiarism, in the case of participants sharing information to one another.
   A limitation in this research is that is the current challenges are only web-based where a series
of steps guide the participant to the challenge flag. Future work could expand into host-based
CTF challenges where participants activities appear more unique from each other, requiring the
system to detect and score IOCs automatically rather than manual entry into the CTF querier
by an administrator.
   Due to the scope of the CTF challenges being web applications the focus is only on the
logging of HTTP traffic, therefore the CTF querier is currently limited to only analysing these
types of events. However, the adopted threat hunting methodology can be adapted to analyse
signatures in other log formats. Similarly, as the infrastructure is currently only using ELK
stack the queries are made only for this vendor. Future work into generating queries from the
challenge dependencies in a standardised format for SIEM tools could be achieved through the
usage of the SIGMA rule formatting to achieve interoperability [16].


7. Conclusion
CTF style games are popular delivery methods for ethical hacking education; however there
is limited research on verifying the legitimacy of CTF participant’s activities and detecting
plagiarism for educational examinations in CTF style games. To solve this problem, a lightweight
automated querying tool called the ‘CTF Querier’ is proposed that queries participant’s activities
and checks for plagiarism or abnormalities in a fast, efficient and scalable way. This is achieved
by combining cyber threat hunting methods with security analytics.
   The combined method of cyber threat hunting and security analytics is achieved threefold.
Firstly, by developing a simple, lightweight and interoperable HTTP logging format that is
indexed in a centralised in a database for later querying. Secondly, transforming the steps a
participant must fulfill into a series of IOCs for matching indexed participant activities and


188                                                         Proceedings of the 28th C&ESAR (2021)
                                                                                  R. Chetwyn and L. Erdodi


finally through the automated CTF querier that queries the challenge. By transforming the steps
participants must take to fulfill a challenge into a series of IOCs, the presented CTF querier
can automatically verify if a participant has fulfilled the challenge steps and match unexpected,
missing or suspicious participant activities. Furthermore it is capable of performing this decision
making without false positives.
   For testing the accuracy and precision of the presented CTF querier, a dataset that contains
the captured participant activities from several CTF style educational components alongside the
captured activities of simulated participants is used. As the results show, the CTF querier can
classify a participant’s activities with high precision and no false positives when querying the
dataset for the challenge step IOCs. Currently the presented CTF querier hunts for signatures in
captured participant activities, future work for the CTF querier can include statistical methods
to aid in machine learning based predictions of participants activities.


References
 [1] HackTheBox, Hacking training for the best, 2021. URL: https://www.hackthebox.eu/.
 [2] C. Academy, 2021. URL: https://ctfacademy.github.io/index.htm.
 [3] M. Lehrfeld, P. Guest, Building an ethical hacking site for learning and student engagement,
     SoutheastCon 2016 (2016). doi:1 0 . 1 1 0 9 / s e c o n . 2 0 1 6 . 7 5 0 6 7 4 6 .
 [4] L. Erdodi, Hacking arena security lab - department of informatics, nd. URL: https://www.
     hackingarena.no/home/index.html.
 [5] J. Porup, Oscp cheating allegations a reminder to verify hacking skills
     when        hiring,       2019.           URL:              https://www.csoonline.com/article/3336068/
     oscp-cheating-allegations-a-reminder-to-verify-hacking-skills-when-hiring.html.
 [6] J. Adams, The dangers of exam dumps, 2016. URL: https://www.cbtnuggets.com/blog/
     career/career-progression/the-dangers-of-exam-dumps.
 [7] D. J. Bianco, The pyramid of pain, 2014. URL: http://detect-respond.blogspot.com/2013/03/
     the-pyramid-of-pain.html.
 [8] N. Kakouros, A cheat detection system for an educational pentesting cyber range: an
     intrusion deficit approach, Master’s thesis, KTH, School of Electrical Engineering and
     Computer Science (EECS), 2020.
 [9] H. Al-Mohannadi, I. Awan, J. Al Hamar, Analysis of adversary activities using cloud-
     based web services to enhance cyber threat intelligence, Service Oriented Computing and
     Applications (2020). doi:1 0 . 1 0 0 7 / s 1 1 7 6 1 - 0 1 9 - 0 0 2 8 5 - 7 .
[10] M. Al Shibani, E. Anupriya, Automated threat hunting using ELK stack – a case study,
     Indian Journal of Computer Science and Engineering (2019). doi:1 0 . 2 1 8 1 7 / i n d j c s e / 2 0 1 9 /
     v10i5/191005008.
[11] V. Mavroeidis, A. Jøsang, Data-driven threat hunting using sysmon, in: ACM International
     Conference Proceeding Series, 2018. doi:1 0 . 1 1 4 5 / 3 1 9 9 4 7 8 . 3 1 9 9 4 9 0 .
[12] Elastic, Elk stack: Elasticsearch, logstash, kibana, n.d. URL: https://www.elastic.co/what-is/
     elk-stack.
[13] R. Daszczyszak, D. Ellis, S. Luke, S. Whitley, 2019. URL: https://www.mitre.org/sites/
     default/files/publications/pr-19-3892-ttp-based-hunting.pdf.


Proceedings of the 28th C&ESAR (2021)                                                                     189
Cheat Detection in Cyber Security Capture The Flag Games - An Automated C...


[14] R. A. Chetwyn,           Ctf querier,       2021. URL: https://github.com/chetwynr/
     CTF-PlagiariasmDetection/.
[15] S. M. Larson, Python elasticsearch client, 2021. URL: https://elasticsearch-py.readthedocs.
     io/en/v7.12.1/.
[16] F. Roth, T. Patzke, Sigma - generic format for siem systems, ???? URL: https://github.com/
     SigmaHQ/sigma.


190                                                       Proceedings of the 28th C&ESAR (2021)

</pre>