<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Models to Analyze and Identify Сybersecurity Incidents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arsen Pavlov</string-name>
          <email>pavlov_arsen@outlook.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miroslava Ruzickova</string-name>
          <email>m.ruzickova@math.uwb.edu.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irada Dzhalladova</string-name>
          <email>idzhalladova@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Kaminsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Bartash</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cybersecurity AI</institution>
          ,
          <addr-line>Long Language Model, ChatGPT, random process</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyiv National University of Economics named after Vadym Hetman</institution>
          ,
          <addr-line>54/1 Prospect Peremogy, 03057 Kyiv</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Białystok, Faculty of Mathematics and Informatics</institution>
          ,
          <addr-line>Białystok</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Many applied methods of cyber security require time-consuming calculations, which requires the use of specialized software for their implementation. Therefore, the issue of using artificial intelligence tools in cyber security analytics to automate routine tasks remains relevant. The research examines the analysis of firewall logs to detect cybersecurity incidents using large language models, artificial intelligence, Python, and the Pandas library in order to abstract the cybersecurity analyst from writing software code and allow him to focus only on creating the right tasks for AI systems. Also, the article proposes a model for evaluating the effectiveness of technical information protection against unauthorized access using a functional approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>We live in the information age, where there has never been such an abundance of information
sources in history. The field of cybersecurity is no exception. Among the most valuable data sources
for analyzing cybersecurity incidents are server logs and firewalls. These logs provide information
about network connections and internal organizational traffic, and in some cases, even user access to
VPNs. In this era of heightened threats, the integration of artificial intelligence (AI) and large
language models (LLMs) becomes a transformative force in the field of cybersecurity.</p>
      <p>
        A large language model is a type of generative artificial intelligence language model that stands
out for its ability to achieve general understanding and generate language. Essentially, it’s an
algorithm that feeds on a “large” or massive dataset to learn the relevant syntax of language. Thanks
to its understanding, a large language model can interpret, analyze, and generate synthetic human-like
sentences or textual information. Notable examples include OpenAI’s GPT models (such as GPT-3.5
and GPT-4, DALL-E, used in ChatGPT), Google’s PaLM (used in Bard), and Meta’s LLaMa [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In a study [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the development process of programs using GPT-4 and ChatGPT was analyzed.
Clear and detailed explanations of artificial intelligence concepts were provided, along with practical
guidelines for effective, secure, and economical integration of OpenAI services.
      </p>
      <p>
        Analysis of cybersecurity incidents involves a deep examination that determines the level of
danger, extent of damage, and losses, as well as the detection of artifacts (traces or samples of
malicious software). In the work by E. Chou [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the application of high-level Python packages and
frameworks is discussed for tasks related to network automation, programming, and security data
analysis, including Azure and AWS Cloud. For those who wish to become more deeply acquainted
with the object-oriented language Python, we suggest reading one of the most famous books [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>ceur-ws.org</p>
      <p>
        The study [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explores the interplay between artificial intelligence, machine learning, and deep
learning, analyzing the impact of large language models such as ChatGPT and Bard [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on society and
professional competencies.
      </p>
      <p>
        Additional difficulties also arise when complex models are described most adequately, for
example, taking into account the time deviation of the argument [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ].
      </p>
      <p>Many applied cybersecurity methods require labor-intensive calculations, necessitating the use of
specialized software for their implementation. Therefore, the question of applying artificial
intelligence tools in cybersecurity analytics to automate routine tasks remains relevant.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Main results</title>
      <p>
        Problem Statement: According to Gartner, Inc [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], security department experts must reassess
their investment balance between protective technologies and a human-centric approach to
cybersecurity when developing and implementing enterprise cybersecurity systems, in line with new
technological trends.
      </p>
      <p>Let’s first define the problem we need to address: conducting an analysis of firewall logs in the
security system to identify data artifacts.</p>
      <p>A log is a text file containing information about software actions or user activities, stored on a
computer or server. It serves as a chronological record of events and their sources, errors, and reasons
behind them.</p>
      <p>Log analysis is a fundamental tool for cybersecurity professionals. It helps uncover the sources of
various issues, detect conflicts in configuration files, and track security-related events. However,
reading and analyzing logs is only possible with specialized software.</p>
      <p>Let’s consider a model for evaluating the effectiveness of technical information security against
unauthorized access using a functional approach. The essence of the functional approach is as
follows:</p>
      <p>Let there be an information system (IS) where, according to regulatory requirements, a certain set
of protective measures Fk must be applied to achieve a specified level (class) of security up to  =
̅1̅,̅̅̅. However, in practice, only a subset of these protective measures, denoted as fk, has been
implemented in the IS from the set Fk. All combinations of protective measures from the Fk set can be
arranged in ascending order of their effectiveness, i.e., their impact on enhancing information security
in the IS.</p>
      <p>
        Yes, according to [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], if an information system (IS) needs to be protected at the third level of
security, it should implement 63 protective measures (  = 63). The number of combinations of such
protective measures, denoted as Nk, would be equal to 2 − 1 = 263 − 1. As we transition from one
combination to another, the effectiveness of protection increases. An approximate indicator of
information security effectiveness can be expressed by the following ratio:
      </p>
      <p>=  1 ∑  =1 ∏  ∈   (  ), (1)
where  (  ) - is the Kronecker delta function, which equals 1 if the protection measure with the
number fk, included in the combination nk, is implemented in the system, and 0 otherwise.</p>
      <p>Illustration of the dependence of the effectiveness of third-level security protection on the current
combination number of protective measures nk is shown in Figure 1.</p>
      <p>Instead of formula (1), an approximate assessment can be expressed using the following ratio:
2  − 1
  = 2  − 1 ,
where n*k represents the number of combinations of events in which all the events included in the
combination occur. For a sufficiently large number of combinations, nk can be approximated using the
following formula:
 (  ) ≈ 2  (1−  )
where vк = n*k / Nk represents the proportion of realized protection measures from the total number
of protection measures that need to be implemented in the system.</p>
      <p>The relationship between the effectiveness of protection and the proportion of implemented
protective measures in a third-class security system is shown in Figure 2.</p>
      <p>Using a human-oriented approach, it is possible to propose the following hypothesis:
Hypothesis 1: Abstract cybersecurity analysts from writing code and allow them to focus solely
on formulating the right questions for artificial intelligence systems.</p>
      <p>The automation of tasks is the primary purpose of systems based on artificial intelligence.
Language models have always been able to perform syntactic analysis, identify patterns in datasets
and texts. On the other hand, large language models have advantages in semantic analysis, allowing
them to understand basic meanings and context, thereby achieving higher accuracy.</p>
      <p>The OpenAI team has developed a highly intuitive Python SDK that can be easily installed using
the pip package manager. To get started, you can install it with the following command:
pip install openai</p>
      <p>To use the OpenAI API, you’ll need an API key. You can register for an API key on the OpenAI
website https://openai.com and create a key in the API keys section.</p>
      <p>import openai
openai.api_key = "Your API Key”</p>
      <p>Replace the value of the parameter “Your Key” with the API key obtained from the OpenAI
platform page. Now it is possible to prompt the user using the input () function:
question = input ("What would you like to ask ChatGPT? ")</p>
      <p>The input () function is used to prompt the user to enter a question they would like to ask the
ChatGPT API. The function takes a string as an argument, which is displayed to the user when the
program is run.</p>
      <p>To pass the user’s question from your Python script to ChatGPT, you will need to use the
ChatGPT API completion function.:
from openai import OpenAI
client = OpenAI()
response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=" What cyber security incidents do you know?"
)</p>
      <p>The client.completions.create() function in your code is used to send a request to the ChatGPT API
for generating completions based on the user’s input prompt. The model parameter allows you to
specify a specific variant or version of the GPT model you’d like to use for processing the request,
and in this case, it’s set to “gpt-3.5-turbo”. The prompt parameter defines the textual prompt for the
API execution, which in this scenario is the user’s question.</p>
      <p>
        By passing contextual information and questions to the function in text format, the responses will
also be obtained in text format. It’s essential to recognize that while ChatGPT performs well in
answering general questions and providing solutions for moderately complex problems, it’s not
infallible. In cases where complex problem-solving requires expert reasoning and context
understanding, artificial intelligence needs sufficient information to provide accurate tools for the task
at hand [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Hypothesis 2: ChatGPT requires context and more information to understand a problem fully. In
some cases, it needs to go through a logical process to achieve the desired goal.</p>
      <p>Let’s demonstrate an example response from ChatGPT to a model problem to validate this
hypothesis (see Figure 3):</p>
      <p>Model Problem 1. "How do I know the IP address of the command-and-control console?"</p>
      <p>Understood that this response is not useful for cybersecurity analysis. Let’s analyze to identify
what’s wrong with the query structure and discover key factors for improvement:
 Lack of sucfiient context in the query.
 Absence of necessary information.
 Missing a clear goal expected from arctifial intelligence.</p>
      <p> Lack of specicfi instrucotins to achieve the query ’s purpose.</p>
      <p>People often forget the intricacies of the thinking process and assume that artificial intelligence
should understand the expert’s thought process, rather than the other way around, which is logical.
However, when working with LLM, context is a crucial component for enhancing the results
obtained. Adhering to this principle, let’s supplement your question with context, and I’ll strive to
provide a more accurate answer.</p>
      <p>Model Problem 2. "Context: The expected outcome is a Python code that will assist the
cybersecurity analyst during the investigation of ransomware attacks on the enterprise network. We
intend to analyze a file containing data related to the production firewall traffic of the Palo Alto
company. The content is already in Pandas DataFrame format, stored in a variable named ‘data’.”</p>
      <p>The response is shown in Figure 4. As you can see, the result is taking shape, but it is still far from
the intended goal. While one of the reasons for using artificial intelligence technology is to obtain
information, it is not the primary objective. The more information is provided in queries, the more
significant improvements are observed in the responses.</p>
      <p>Clearly, in a situation where there is a large volume of logs (for example, a 200 MB $MFT file), it
is unrealistic to send this information to artificial intelligence for processing. Aside from data
protection ssues, the costs of using the artificial intelligence system’s API would be significantly
higher than planned.</p>
      <p>Instead, let's tell the artificial intelligence in the request what the data for analysis looks like,
providing it with as much contextual information as possible so that the data itself is irrelevant (see
Fig. 5). In this case, we will add to the ChatGPT request a description of the columns that contain logs
and a description of each field.
Model Problem 3. "Context: The expected output is Python code that will assist cybersecurity
analysts in their investigation of a ransomware attack on an enterprise network. We are going to
analyze a file containing data about the traffic of the Palo Alto company's firewall. The content is
already in Pandas DataFrame format in a variable called "data". Columns in this DataFrame:
{fields}
And private ranges of IP addresses:
- 10.0.0.0 і 10.255.255.255
- 172.16.0.0 і 172.31.255.255
- 192.168.0.0 і 192.168.255.255”</p>
      <p>The role of the analyst in such a process is important. What is needed is not just an understanding
of how the technology works, but an understanding of the problems and the ability to ask questions,
even when the analyst has abstracted from the intermediate process of analysis.</p>
      <p>Let's change the general question used earlier to a more specific prompt that can guide the analysis
process. Let's use the same context as before, but change the question (answer in Fig. 6):</p>
      <p>Model Problem 4. List the external IP addresses to which the most connections were made when
browsing the web between 6:00 PM and 8:00 AM, and show me in a column the number of unique
source IP addresses that connected to each of the external IP addresses address.</p>
      <p>By using this code, only one IP address will be obtained, and this is a rather suspicious situation.
Investigating incidents of this type requires detailed analysis of the fact that only two source IP
addresses are making HTTP connections during off-hours.</p>
      <p>It is necessary to indicate to the artificial intelligence the correct way to achieve the desired result.
In some cases, analysts clearly understand the path to follow to achieve the right goal, but it is not
necessary to spend time searching for Python functions that will lead to the result, i.e. the analyst
knows what is needed, but not how to do it, and this is why use chatGPT to automate routine
operations.</p>
      <p>Model Problem 5. We want to find the external IPs that are probably the management and control
consoles, so we will look for repeated connections throughout the day between the two IPs. Follow
these steps:
• Filter connections to external IP addresses.
• Extract connection time excluding minutes.
• Group IP connections by source and destination.
• The grouping above counts the number of unique values at connection time, the sum of bits
sent to the destination IP address, and counts the number of connections between those two
IP addresses.</p>
      <p>The result of the request is shown in Figure 7.</p>
      <p>The result is a list of IP addresses that need to be analyzed, because the fact that there are
thousands of connections between two IP addresses during 13 different periods of the day (according
to the log) is suspicious activity, and can be classified as a potentially dangerous cyber incident.</p>
      <p>We will search for long connections in the traffic. These types of connections typically need to be
analyzed as more attackers use remote assistance tools like TeamViewer to avoid detection and
maintain access to the organization's network. For this, we will use the following question:
Model Problem 6. We want to create a graph of long connections between an internal IP address and
an external IP address by following the steps below.</p>
      <p>• Filter messages to external IP addresses.
• Sum the connection time for each destination IP address and store it in a separate column
called "sum_time".
• Add the number of connections made to each destination IP address and store it in a separate
"sum_conn" column.
• Filter and store results with 'sum_conn' greater than 10.
• Keep only one line for each destination IP address.
• Divide the number of connections by the sum of connections and store it in a column called
"avg_conn".
• Filter the ten results with the highest "avg_conn" value.
• Create a graph using the matplotlib library, where the "x" axis is the number of connections
and the "y" axis is the total connection time.</p>
      <p>The response of the artificial intelligence system to the request is shown in Fig. 9.</p>
      <p>When this program code is executed, we get the following result shown in Figure. 10.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusions</title>
      <p>Therefore, data analysis using AI is one of the key aspects of the future across virtually any field.
However, in the case of cybersecurity, it becomes an essential skill for analysts. Knowledge of how to
utilize tools such as LLM and artificial intelligence will impact the effectiveness of incident
investigation and security monitoring. Cybersecurity is increasingly crucial in the modern world, and
analysts must be prepared to employ contemporary methods and tools to safeguard data and networks.</p>
    </sec>
    <sec id="sec-4">
      <title>4. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Yanev</given-names>
            <surname>Martin</surname>
          </string-name>
          , «
          <article-title>Building AI Applications with ChatGPT APIs»</article-title>
          , Published by Packt Publishing Ltd. (
          <year>2023</year>
          ), 258 pp.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Caelen</given-names>
            <surname>Olivier</surname>
          </string-name>
          , Blete Marie-Alice «
          <article-title>Developing Apps with GPT-4 and ChatGPT</article-title>
          . Build Intelligent Chatbots, Content Generators, and More»,
          <string-name>
            <surname>Published by O'Reilly Media</surname>
          </string-name>
          , Inc. (
          <year>2023</year>
          ), 183 pp.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2020</year>
          )
          <article-title>Mastering Python Networking. 3rd edn</article-title>
          . Packt Publishing. Available at: https://www.perlego.com/book/1365840/mastering-python
          <article-title>-networking-your-onestop-solution-tousing-python-for-network-automation-programmability-and-</article-title>
          <string-name>
            <surname>devops-</surname>
          </string-name>
          3rd
          <string-name>
            <surname>-</surname>
          </string-name>
          edition-pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Lutz</surname>
          </string-name>
          , Learning Python,
          <string-name>
            <given-names>Fourth</given-names>
            <surname>Edition. Published by O'Reilly Media</surname>
          </string-name>
          , Inc., (
          <year>2009</year>
          ), 1213 pp.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Kneusel</given-names>
            <surname>Ronald</surname>
          </string-name>
          <string-name>
            <surname>T.</surname>
          </string-name>
          «How AI Works: From Sorcery to Science» No Starch Press, (
          <year>2023</year>
          ), 192 pp.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Morgan</surname>
          </string-name>
          , ChatGPT Vs Bard:
          <article-title>Which is better for coding? URL: https://www.pluralsight.com/blog/software-development/chatgpt-vs-bard-coding</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Khusainov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Diblik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shatyrko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bastinec</surname>
          </string-name>
          .
          <source>Estimates of Solution Convergence Dynamical Processes in Neuronet with Time Delay // Conference Proceedings “IEEE ATIT</source>
          <year>2019</year>
          ”, p.
          <fpage>411</fpage>
          -
          <lpage>414</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Andriy</given-names>
            <surname>Shatyrko</surname>
          </string-name>
          , Denys Khusainov, Oleksii Bychkov,
          <source>Josef Diblik and Jaromir Bastinec. Construction and Optimization of Stability Conditions of Learning Processes in Mathematical Models of Neurodynamics // CEUR Workshop Proceedings “IT&amp;I-</source>
          <year>2022</year>
          ”,
          <year>2022</year>
          , Vol.
          <volume>3384</volume>
          , p.
          <fpage>42</fpage>
          -
          <lpage>51</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] Gartner Identifies the Top Cybersecurity Trends for 2023 Gartner Identifies the Top Cybersecurity Trends for 2023 URL: https://www</article-title>
          .gartner.com/en/newsroom/press-releases/
          <fpage>04</fpage>
          - 12-2023
          <string-name>
            <surname>-</surname>
          </string-name>
          gartner
          <article-title>-identifies-the-top-cybersecurity-trends-for-2023</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Dzhalladova</surname>
            ,
            <given-names>Irada</given-names>
          </string-name>
          &amp; Ruzickova,
          <string-name>
            <surname>Miroslava.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Dynamical system with random structure and their applications”</article-title>
          . Cambridge Sientific Publishers, ISBN:
          <fpage>978</fpage>
          -1-
          <fpage>908106</fpage>
          -66-7.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Kaminsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koval</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yereshko</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vdovenko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bocharov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kazancoglu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Evaluating the effectiveness of enterprises' digital transformation by fuzzy logic</article-title>
          .
          <source>Advances in soft computing applications</source>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>