<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Dmytro Lande</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Information Recording of National Academy of Sciences of Ukraine</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>1</volume>
      <issue>2</issue>
      <fpage>173</fpage>
      <lpage>183</lpage>
      <abstract>
        <p>Modern challenges in cybersecurity require new approaches to information retrieval and data analysis. The growth of data volumes and the speed of their updates make traditional information processing methods insufficiently effective. This paper proposes the integration of large language models (LLMs) into information retrieval systems to enhance analytical capabilities and automate data processing tasks. The main goal of the research is to translate the analytical component of the information retrieval system to LLMs, significantly improving the accuracy, completeness, and relevance of information searches. The system Cyber Aggregator, used for monitoring and analyzing social media content in the context of cybersecurity, demonstrates the effectiveness of the proposed approach. The integration of LLMs into Cyber Aggregator allows for the automation of semantic indexing processes, enhances the formulation and modification of user queries, and provides more precise summarization and analysis of search results. This includes creating analytical digests, identifying key events, constructing semantic maps, and conducting semantic analysis. The proposed methodology is based on leveraging the powerful capabilities of LLMs, such as understanding complex relationships between concepts, analyzing context, and automatically forming conclusions. The application of this technology in cybersecurity contributes to more effective threat monitoring, improved situational awareness, and enhanced real-time threat response capabilities. The paper also presents a UML diagram illustrating the key components of the system, along with a mathematical formalization of the main processes related to the integration of LLMs into information retrieval systems. The research findings indicate that the use of LLMs combined with information retrieval technologies opens new opportunities for automating data analysis and ensuring cybersecurity. This makes the proposed approach an important tool for cybersecurity professionals engaged in open-source intelligence (OSINT) and other analytical tasks in today's information environment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cybersecurity</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Data Analysis Automation</kwd>
        <kwd>Social Media Monitoring</kwd>
        <kwd>Semantic Analysis</kwd>
        <kwd>Cyber Aggregator1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid evolution of threats in cyberspace requires the use of advanced tools and methodologies
for timely detection and neutralization of threats. Open Source Intelligence (OSINT) is a key
component in cybersecurity, utilizing publicly available information to identify and minimize risks
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The emergence of generative artificial intelligence models, particularly large language models
(LLMs), opens up new opportunities for automating the collection, processing, and analysis of data
across various fields [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. In particular, paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a comprehensive overview of the
application of LLMs in computational linguistics, while paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] outlines the fundamentals of
semantic networking, methodologies for forming semantic networks, and domain models through
engagement with LLMs.
      </p>
      <p>
        The authors of this work have already developed the Cyber Aggregator system [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which is used
for monitoring and analyzing content on social media, particularly in the context of cybersecurity.
0000-0002-8585-1044 (O. Puchkov); 0000-0003-3945-1178 (D. Lande)
© 2023 Copyright for this paper by its authors.
      </p>
      <p>Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
This system has proven effective in collecting and processing large volumes of data from open
sources, enabling the prompt detection and analysis of relevant cyber threats.</p>
      <p>
        Based on Cyber Aggregator, the authors propose the implementation of new capabilities of large
language models (LLMs) for automating the analysis of textual data. The application of LLMs
significantly enhances the accuracy and completeness of searches, improves the formulation and
modification of user queries, and provides deeper and more precise analysis of results [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. This not
only reduces the volume of routine work but also increases the overall efficiency of the system,
allowing specialists to focus on more complex analytical tasks.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Statement</title>
      <p>Despite advancements in information retrieval technologies, traditional methods often face issues
with the completeness and accuracy of the obtained information. This problem is particularly acute
in the field of cybersecurity, where timely access to relevant data is critically important. The challenge
lies in the need to develop more sophisticated systems that can intelligently process user queries,
identify the most significant information, and present it in a concise and understandable format.</p>
      <p>As the volume of information and the complexity of cyber threats increase, traditional methods of
information retrieval and analysis become less effective. In a context where cybercriminals employ
increasingly sophisticated attack methods, the need for rapid and accurate threat detection is critical.
This requires automation of the processing of large data volumes and enhancement of analytical
quality.</p>
      <p>
        Recently, large language models (LLMs) have demonstrated significant potential in improving the
quality of text information processing. The availability of open-source software and models like
LLama-2 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] opens new opportunities for their integration into closed corporate systems. This is
particularly relevant for systems dealing with cybersecurity issues, where ensuring reliable and
timely monitoring of the information space is a priority.
      </p>
      <p>Integrating such technologies into existing systems, such as Cyber Aggregator, can significantly
enhance the effectiveness of cyber threat detection, optimize information retrieval and analysis
processes, and automate the generation of analytical summaries and the construction of semantic
networks. This provides a new level of protection for information systems and enables more effective
responses to modern cyber threats.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Goal</title>
      <p>The main goal of this research is to develop and formalize a methodology that integrates LLM into a
social media monitoring system focused on cybersecurity to enhance the accuracy and relevance of
information retrieval. This methodology includes semantic indexing, query modification, and result
summarization performed using LLM. The research also aims to provide a clear mathematical
formalization of the processes involved in the proposed system.</p>
      <p>To achieve this goal, it is necessary to address the following tasks:
1. Development of a methodology for semantic indexing of textual data using LLM. This task
involves creating an algorithm that allows for the preprocessing of textual data by identifying
key concepts, their relationships, and forming an index for effective database searching.
2. Integration of LLM into the process of modifying user queries to enhance their accuracy and
completeness. The goal is to develop approaches for dynamic modification of user queries
based on semantic analysis of the text, which will provide more relevant information retrieval
results.
3. Development of algorithms for summarizing search results using LLM. This task includes
creating a methodology for automatically generating digests, summaries, and other analytical
products based on relevant documents obtained from the search.
4. Formalization of processes involved in the proposed methodology. The task is to develop
mathematical models and formalisms that accurately describe the stages of semantic indexing,
query modification, and result summarization.
5. Integration and testing of the proposed methodology in real conditions within the Cyber
Aggregator system. This task involves implementing the developed approaches into the
existing social media monitoring system, conducting test studies, and analyzing the results.</p>
    </sec>
    <sec id="sec-4">
      <title>4. System Architecture</title>
      <p>
        The architecture of a social media monitoring system for cybersecurity, integrated with large
language models (LLMs), consists of several key components, each performing specific functions and
interacting with other parts of the system to achieve a common goal. The main components of the
architecture include:
1. Data Collection Module: Responsible for aggregating data from various sources, such as social
media, forums, blogs, and other public platforms. This module ensures regular and efficient
collection of textual data, including real-time data, with the ability to pre-filter and clean the
data.
2. Database and Data Storage: Utilizes specialized data storage systems, such as Elasticsearch, to
store the collected information and ensure quick access to it [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The database is structured
to support efficient semantic indexing and searching, as well as scalability for handling large
volumes of information.
3. Semantic Indexing Module: Performs functions of text analysis and building semantic indexes
based on key concepts and their relationships [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Integration with LLM allows for the
creation of more complex and accurate indexes that consider the context and meanings of
words in different domains of knowledge.
4. Search Optimization Module: Uses LLM to modify user queries to improve search results. This
module automatically analyzes input queries, supplementing or refining them to ensure
maximum relevance and accuracy of the results.
5. Results Processing Module: Responsible for summarizing and analytically processing search
results. The application of LLM allows for the automatic creation of digests, analytical
summaries, detecting events, and constructing semantic maps to visualize relationships
between data.
6. User Interface: Provides user interaction with the system. The interface includes dashboards
for customizing queries, viewing search results, and obtaining analytical products in a
userfriendly format. Sometimes, integration with LLM may also allow interaction through
chatbots or other interactive interfaces.
7. Security and Access Management Module: Ensures data protection and access management
to the system. This component is particularly important in the context of integration with
corporate systems, where strict cybersecurity requirements must be followed.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <sec id="sec-5-1">
        <title>The proposed methodology consists of three main stages: 1.</title>
        <p>
</p>
        <p>The proposed methodology consists of three main stages:
Data collection: Gathering data from various open sources using the CyberAggregator system.
Preprocessing: Cleaning and normalizing data to prepare it for indexing.
Semantic indexing: Using LLMs for semantic indexing, identifying key concepts and
relationships in the data. The indexed data is stored in an Elasticsearch database.
Query Processing:
Query analysis: Analyzing and modifying user queries to improve completeness and accuracy.
LLM offers synonyms, related terms, and alternative query structures.</p>
        <p>Information retrieval: Searching for relevant documents in the Elasticsearch database based
on the modified query.
3. Summary and Analysis of Results:</p>
        <p>Summary: Automatic generation of digests, summaries, semantic maps, and other analytical
materials using LLM.</p>
        <p>Event detection and semantic map construction: Identifying significant events and creating
semantic maps that visualize the connections between key concepts.</p>
        <p>′ =  ∪  ′,  ′, . . . ,  ′</p>
        <p>.

( ′,  ) =</p>
        <p>.</p>
        <p>∈ ′</p>
        <p>( ′,  ) &gt;  .</p>
        <p>= { ,  , . . . ,  }.

= 
( , 
).</p>
        <sec id="sec-5-1-1">
          <title>6.3. Summary of Results</title>
          <p>The summarization process aggregates information from relevant documents  =
{ ,  , . . . ,  }to create a set of summaries  :</p>
          <p>Each summary is generated using LLM (denoted as a function and the corresponding prompt),
which highlights the most significant points from the relevant document:</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Mathematical formalization</title>
      <sec id="sec-6-1">
        <title>6.1. Semantic indexing</title>
        <p>Let there be a set of documents 
= { ,  , . . . ,  } and "a set of terms"  = { ,  , . . . ,  } used for
indexing. The indexing process assigns a weight 
to each term  in the document  , which can
be formalized as follows:
 ( ) =  , 
| ∈  ,</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Modification of queries</title>
        <p>For the user query  = { ,  , . . . ,  }, the LLM modifies the query by expanding it with additional
relevant terms  ′, forming the extended query  ′:</p>
        <p>The relevance of the document  to the query  ′ is assessed using a similarity function:
A document is considered relevant if its similarity score exceeds a defined threshold  :</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.4. Detection of connections</title>
        <p>Construction of the term relationship matrix:






1. Calculation of values in the matrix:</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.5. Formation of the network</title>
        <sec id="sec-6-4-1">
          <title>1. Creating a graph:</title>
          <p>We will create a term matrix  of size  ×  , where  is the number of terms in the
document.</p>
          <p>The element  of this matrix defines the relationship between terms  and  .
The significance  can be defined as the frequency of co-occurrence of terms  and  in
a document  . This can be implemented by counting how many times the terms appear in
the same context, or through metric values such as mutual information.
Let  = ( ,  ) be a graph, where  is the set of vertices (terms), and  is the set of edges
(connections between terms).</p>
        </sec>
        <sec id="sec-6-4-2">
          <title>2. Definition of nodes and edges:</title>
        </sec>
        <sec id="sec-6-4-3">
          <title>Vertices  correspond to terms  from the set  ( ). Edges  connect pairs of terms  and  if  exceeds a certain threshold  .</title>
          <p>=  ,  |
&gt;  .
where  is the significance threshold that determines which connections between terms are
substantial.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Implementation</title>
      <p>As a result of integrating the CyberAggregator system with the large language model Llama,
significant improvements have been achieved in the system's analytical capabilities in the areas of
social media monitoring and information retrieval. In this section, we will explore how Llama's new
features enhance various operational modes of CyberAggregator, including information search,
dynamic analysis, digest generation, and network construction.</p>
      <sec id="sec-7-1">
        <title>7.1. Information Summaries (Digests)</title>
        <p>The combination of search technology with the capabilities of Llama enables the automatic analysis
of news reports and the creation of summaries. The Llama model allows the Cyber Aggregator system
to generate detailed information digests that include:
1. The system automatically generates digests by processing large volumes of news,
identifying key events and facts, and creating a concise overview of the main events based
on them.
2. The linguistic capabilities of Llama help to better understand the context and meaning of
events, improving the accuracy and usefulness of the digests for users.</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Networks of Hacker groups networks</title>
        <p>The Cyber Aggregator system, enhanced with Llama capabilities, effectively visualizes the
connections between hacker groups:
1. Detection of connections between groups. Llama assists in identifying and analyzing the
connections between different hacker groups, their activities, involvement in
cyberattacks, and their relationships with law enforcement agencies of specific states.
2. Information visualization. Integration with Llama allows for the automatic creation of
visualizations that depict the connections between groups, facilitating analysis and the
detection of patterns in their activities.</p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Term networks</title>
        <p>The functional capabilities of Llama also enable the analysis and construction of term networks:
1. Term Analysis The model helps automatically identify key terms and their relationships,
enabling a better understanding of the context of the information.
2. Building Semantic Networks Using Llama, semantic networks can be created to visualize
the connections between terms and concepts, facilitating the understanding of complex
concepts and their interrelations.</p>
      </sec>
      <sec id="sec-7-4">
        <title>7.4. Personal Networks</title>
        <p>AI systems leverage their linguistic capabilities to create and analyze networks of individuals involved
in cyber warfare. This process begins with analyzing individuals' activities, where an LLM model
helps detect connections based on their social media interactions and mentions in various sources.
By examining this data, the model identifies both direct and indirect relationships among individuals.
This approach enables the formation of more accurate and comprehensive networks, providing
deeper insights into the dynamics of cyber warfare actors.</p>
        <p>To identify the actors involved in the world's first cyberwar, a methodology has been proposed
for analyzing selected documents available in electronic sources on the Internet, using a generative
artificial intelligence system.</p>
        <p>At the first step of the methodology, a query is formed for the search aggregator, such as
CyberAggregator, using keywords that must be included in the document for further analysis. This
query should include the keywords that are essential for the document's presence for further
examination. After finding a sufficient number of text messages, these documents are filtered using
generated LLM code, for example, in Python, to search for pairs of concepts formatted as “First Last
Name”.</p>
        <p>In the next step, the filtering of the provided phrases is carried out. The information is converted
into a PDF file, and a prompt is formulated for the LLM with the following wording:
 Extract names and surnames from the given file, ignoring proper names and
organization names.</p>
        <p>In our case, approximately 700 names were extracted from over 30,000 phrases. To optimize the
construction of the network, a software code was developed in Python, which counts the number of
occurrences and removes all appearances except for the first one, as well as eliminates words that are
mentioned less than a specified number of times (in our case, 3), as they lack statistical significance
and only clutter the network with unnecessary information. Connections between actors are created
using ChatGPT:</p>
        <p> Find connections between characters linked by their activities to build a cohesive
network, and use all names in the connections in the format “character1; character2”.</p>
        <p>In the third step, after establishing connections between participants in the specified format, the
obtained information is recorded in a CSV file.</p>
        <p>
          In the fourth and final step, using a special software application developed based on the GraphViz
library [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], a graphical representation of the cyberwar actors and their connections is created (Fig.
1).
        </p>
        <p>Let's provide the mathematical formalization of the method for detecting cybersecurity subjects.</p>
      </sec>
      <sec id="sec-7-5">
        <title>7.4.1. Initial assumptions</title>
        <p>Document set  = { ,  , . . . ,  } — a collection of documents obtained through OSINT systems
based on thematic queries.</p>
        <p>Hacker group set  — a set of names of hacker groups that need to be identified from the document
texts.</p>
        <p>Contextual connections  — a set of connections between hacker groups extracted from the
document texts.</p>
      </sec>
      <sec id="sec-7-6">
        <title>7.4.2. Step 1: Formation of the publication information array</title>
        <p>For each set of thematic queries  (for example, queries based on cyberattacks in Ukraine or Israel),
we obtain a set of documents  that correspond to these queries.</p>
        <p>=
∈

( )
( ) is a function that returns a set of documents  for the thematic query  .</p>
      </sec>
      <sec id="sec-7-7">
        <title>7.4.3. Step 2: Extraction of hacker group names</title>
        <p>For each document  ∈  , we create the corresponding prompt for the ChatGPT system to extract
the names of hacker groups:
 ( ) =  ℎ
(
,  )
where  ( ) is the set of hacker groups extracted from document  , and prompt is the substantive
query to the ChatGPT system.</p>
      </sec>
      <sec id="sec-7-8">
        <title>7.4.4. Step 3: Building a network of connections</title>
        <p>Based on the extracted names of hacker groups, we form a set of contextual connections for each
document:</p>
        <p>( ) = (ℎ , ℎ )|ℎ , ℎ ∈  ( )
where  ( ) is the set of paired connections between hacker groups from document  . The
overall set of connections for all documents is defined as:
 =</p>
        <p>( )
∈</p>
      </sec>
      <sec id="sec-7-9">
        <title>7.4.5. Step 4: Visualization and analysis of the network</title>
        <p>The network of connections between hacker groups  , constructed based on a set of groups  and a
set of connections  , can be represented as a graph  = ( ,  ), where:
  — the set of vertices (hacker groups);</p>
        <p>— the set of edges (contextual connections between the groups).</p>
      </sec>
      <sec id="sec-7-10">
        <title>7.4.6. Computational Complexity</title>
        <p>1. Formation of an Information Array of Publications:</p>
        <p>The complexity depends on the number of queries  and the number of documents  . The
complexity of forming a set of documents can be estimated as  (| | ×  ).
2. Extraction of Hacker Group Names:</p>
        <p>For each document  , a request is made to the ChatGPT system. Let  be the average
processing time for one request to the system. Then, the total complexity of this stage is:
 ( ×  ).
3. Construction of a Network of Connections:</p>
        <p>For each document  , the connections between the groups are extracted. If hacker groups
| ( )| are found in the document  , the number of connections between them can be
estimated as  (| ( )| ). The overall complexity of constructing the network will be:
 (∑ ∈ | ( )| ).
4. Visualization and Analysis of the Network:</p>
        <p>The complexity of visualization depends on the number of vertices | |and edges | | in
the graph  = ( ,  ). In the worst case, the complexity of visualization and analysis can
be estimated as  (| | + | |).</p>
        <p>Taking into account all stages, the overall complexity of the algorithm is:
 (| | ×  +  × 
+</p>
        <p>| ( )| + | | + | |)
∈</p>
        <p>This method allows for the effective extraction and analysis of relationships between hacker
groups based on data from textual sources, using generative artificial intelligence tools.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Usage</title>
      <p>Thus, the proposed approaches enabled:
1. The system successfully identified and summarized key events occurring in the field of
cybersecurity.
2. It automatically created analytical digests from a large volume of documents.
3. Semantic maps were constructed to visualize the relationships between key concepts in
cybersecurity.</p>
      <p>The integration of Llama into the CyberAggregator system significantly improved the quality and
accuracy of information retrieval and analytical processes. The system is now capable of
automatically generating more detailed and useful informational digests, creating accurate networks
of individuals and groups, and conducting deeper semantic analysis of terms. These enhancements
contribute to increased efficiency in detecting important events and patterns within large volumes of
information, which is critically important for ensuring cybersecurity.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Discussion</title>
      <p>The proposed methodology has shown significant potential in enhancing information retrieval
systems in the context of cybersecurity. The integration of LLMs provides a deeper understanding of
user queries and enables the retrieval of more relevant and high-quality information. This opens up
new opportunities for automating OSINT processes and improving the efficiency of cybersecurity
analysts.
10. Conclusion
In this study, a methodology for integrating large language models (LLMs) into a social media
monitoring system focused on cybersecurity has been developed and formalized. The main objective
was to improve the accuracy and relevance of information retrieval by implementing new capabilities
of LLMs into the CyberAggregator system. The results of the research confirm the success of
achieving this goal.</p>
      <p>The integration of information retrieval technologies and artificial intelligence has great potential
in the field of cybersecurity. The proposed system demonstrates how LLMs can be used to enhance
the accuracy and completeness of information retrieval, as well as to automatically summarize results.
In the future, the development of this system may lead to the creation of more advanced tools for
OSINT, enabling better responses to modern threats in cyberspace.</p>
      <p>The methodology developed in the study involves several key stages. It begins with semantic
indexing, where LLM is used for the automatic analysis and classification of textual data. The Llama
model excels in accurately recognizing terms and their relationships, which helps create a
highquality information index that greatly improves search efficiency. Next, query modification takes
advantage of Llama's linguistic capabilities to automatically refine user queries, enhancing the
accuracy and completeness of the search results by adjusting them according to the context and
specifics of the requested information. Finally, result summarization employs Llama to generate
concise summaries and digests based on the search results, offering a clear and understandable
presentation of the key facts and events.</p>
      <p>The research delves into the mathematical formalization of processes within the proposed system.
The Llama model enhances semantic indexing by employing algorithms for semantic analysis that
create indexes based on vector representations of words and their contexts. This involves
constructing a term-document matrix, where both terms and documents are depicted as vectors in a
multidimensional space.</p>
      <p>For modifying queries, the mathematical formalization incorporates algorithms designed to refine
queries through contextual analysis. This is achieved by optimizing a query's utility function, which
allows the model to automatically adjust queries for greater accuracy in results.</p>
      <p>Additionally, the generalization of outcomes, such as digests and summaries, is accomplished
through text generation algorithms that leverage clustering and data summarization techniques. This
process aggregates information while taking into account its significance and context.</p>
      <p>The tasks defined in the research objective were implemented as follows: The integration of Llama
into the CyberAggregator system allowed for the automation of the analysis of large volumes of
textual data. The model automatically processes news messages, creates summaries, and generates
reports, which enhances the speed and accuracy of analytical processes. The application of Llama has
led to a significant improvement in the accuracy of searches and the relevance of results. The model
adapts queries according to the context and specifics of the requested information, enabling the
retrieval of more precise and useful results. The developed mathematical models provide a clear
understanding and implementation of the processes within the system, allowing for the enhancement
of its functionality and integration with Llama.</p>
      <p>The developed methodology and integration of Llama into CyberAggregator have a significant
impact on the practical application of the system in the field of cybersecurity. It enables effective
monitoring and analysis of social media, as well as automatic responses to emerging threats and
trends in the information space. This enhances the system's ability to predict and detect potential
cyberattacks and threats, which is critical for ensuring cybersecurity.</p>
    </sec>
    <sec id="sec-10">
      <title>References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Meredith</surname>
          </string-name>
          .
          <article-title>The OSINT Handbook: A practical guide to gathering and analyzing online information</article-title>
          ., Birmingham, UK: Packt Publishing,
          <year>2024</year>
          . 198 p.
          <source>ISBN: 1837638276</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wolfram</surname>
          </string-name>
          . What Is ChatGPT Doing ... and Why Does It Work? Wolfram Media, Inc.,
          <year>2023</year>
          . ISBN:
          <volume>9781579550813</volume>
          ,
          <fpage>978</fpage>
          -
          <lpage>157</lpage>
          -9550-82-0.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chat</surname>
            <given-names>GPT AI</given-names>
          </string-name>
          <article-title>Revolution 2023: A Guide to GTP Chat Technology and Its Social Impact</article-title>
          .
          <source>Technology Summary</source>
          ,
          <year>2023</year>
          . 64 p.
          <source>ISBN 979-837-7089-14-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Karanikolas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Manga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Samaridi</surname>
          </string-name>
          , E. Tousidou,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vassilakopoulos</surname>
          </string-name>
          .
          <article-title>Large Language Models versus Natural Language Understanding and Generation</article-title>
          .
          <source>PCI</source>
          <year>2023</year>
          :
          <article-title>27th Pan-Hellenic Conference on Progress in Computing and Informatics</article-title>
          , Lamia, Greece,
          <year>November 2023</year>
          . DOI: https://doi.org/10.1145/3635059.3635104
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Dmytro</given-names>
            <surname>Lande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Leonard</given-names>
            <surname>Strashnoy. GPT Semantic</surname>
          </string-name>
          <article-title>Networking: A Dream of the Semantic Web - The Time is Now</article-title>
          . Kyiv: Engineering,
          <year>2023</year>
          . - 168 p.
          <source>ISBN 978-966-2344-94-3</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Dmytro</given-names>
            <surname>Lande</surname>
          </string-name>
          , Olexander Puchkov,
          <string-name>
            <given-names>Ihor</given-names>
            <surname>Subach</surname>
          </string-name>
          .
          <source>Method of Detecting Cybersecurity Objects Based on OSINT Technology. Selected Papers of the XXII International Scientific and Practical Conference "Information Technologies and Security" (ITS</source>
          <year>2022</year>
          )
          <article-title>-</article-title>
          Vol-
          <volume>3503</volume>
          . - pp.
          <fpage>115</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>ChengXiang</given-names>
            <surname>Zhai. Large Language</surname>
          </string-name>
          <article-title>Models and Future of Information Retrieval: Opportunities and Challenges</article-title>
          .
          <source>SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . Pp.
          <volume>481</volume>
          -
          <fpage>490</fpage>
          . DOI: https://doi.org/10.1145/3626772.3657848
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kukreja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Purohit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Guha</surname>
          </string-name>
          .
          <article-title>A Literature Survey on Open Source Large Language Models</article-title>
          .
          <source>ICCMB '24: Proceedings of the 2024 7th International Conference on Computers in Management and Business</source>
          . Pp.
          <volume>133</volume>
          -
          <fpage>143</fpage>
          . DOI: https://doi.org/10.1145/3647782.3647803
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Castanedo. Run</surname>
          </string-name>
          Llama-2
          <string-name>
            <surname>Models. O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2023</year>
          . ISBN:
          <volume>9781098163198</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Pranav</surname>
            <given-names>Shukla</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharath Kumar M N.</surname>
          </string-name>
          <article-title>Learning Elastic Stack 7.0</article-title>
          .
          <string-name>
            <surname>Distributed</surname>
            <given-names>Search</given-names>
          </string-name>
          , Analytics, and Visualization Using Elasticsearch, Logstash, Beats, and
          <string-name>
            <surname>Kibana</surname>
          </string-name>
          ,
          <source>2nd Edition. Packt Publishing</source>
          ,
          <year>2019</year>
          . ISBN 9781789958539,
          <year>1789958539</year>
          . - 474 p.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gavilanes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bozhilov</surname>
          </string-name>
          , U. Dodeja,
          <string-name>
            <given-names>G.</given-names>
            <surname>Valtas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Badrajan</surname>
          </string-name>
          .
          <article-title>Use of LLM for Methods of Information Retrieval</article-title>
          . Report of University of Twente,
          <year>2024</year>
          . Available: https://bachelorshowcase-eemcs.apps.utwente.nl/content/TytQHsvY/Design_report.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Tamilla</given-names>
            <surname>Triantoro</surname>
          </string-name>
          . Graph Viz: Exploring, Analyzing, and
          <article-title>Visualizing Graphs and Networks with Gephi</article-title>
          and
          <source>ChatGPT (March</source>
          <volume>30</volume>
          ,
          <year>2023</year>
          ). ODSC Community.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>