<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CyKG-RAG: Towards knowledge-graph enhanced retrieval augmented generation for cybersecurity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kabul Kurniawan</string-name>
          <email>kabul.kurniawan@wu.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elmar Kiesling</string-name>
          <email>elmar.kiesling@wu.ac.at</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Ekelhart</string-name>
          <email>andreas.ekelhart@univie.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Austrian Center for Digital Production (ACDP)</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SBA Research</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Vienna</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>WU Wien, Institute for Data</institution>
          ,
          <addr-line>Process and Knowledge Management, Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Cybersecurity threat detection and analysis have been increasingly prominent research areas in recent years. Like in other fields, the rise of generative AI and Large Language Models (LLMs) have created new opportunities to advance cybersecurity, but also highlighted some of the key challenges LLMs sufer from - including hallucinations, knowledge deficiencies, and lack of ability to handle factual information. To tackle such limitations, Retrieval Augmented Generation (RAG) - which dynamically retrieves relevant information from external sources to enhance the capabilities of LLMs - has shown promise in many domains. However, naïve RAG approaches typically operate on text and do not consider symbolic representations, conceptual meaning and relations in the data. In the cybersecurity domain, such RAG approaches do not allow to consider pivotal aspects such as network structure, threat attack patterns, and intricate factual security knowledge which are essential to identify suspicious activities and combine clues to discover and reconstruct attacks. To address this challenge, we proposed CyKG-RAG, a novel framework that integrates Knowledge Graphs (KGs) with the RAG approach and is tailored to improve cybersecurity detection and analysis. Our framework utilizes the rich semantic relationships and structured data within KGs to provide contextually relevant information, thereby enhancing the accuracy and reliability of cyber threat detection. We validate our approach through real-world use cases, which demonstrate its efectiveness and show promising results for improving cybersecurity measures.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RAG</kwd>
        <kwd>LLM</kwd>
        <kwd>Cybersecurity</kwd>
        <kwd>Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge-driven approaches have gained prominence in cybersecurity, facilitating the standardization
of terminology and the automation of decision-making processes through the use of ontologies and
knowledge graphs. As a result, a number of formal knowledge representations in cybersecurity (e.g., [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1,
2, 3, 4, 5, 6</xref>
        ]) and methodologies for the construction of knowledge graphs for cybersecurity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are now
available.
      </p>
      <p>
        Despite the active development of Semantic Web technologies and the increasing adoption of
graphbased methods in various domains, a notable discrepancy persists between the competencies of
cybersecurity professionals and their capacity to efectively utilize semantic technologies in security processes.
The challenge of enabling security professionals to access, query, and leverage knowledge graphs
without requiring in-depth expertise in the semantic domain has long been an obstacle. This represents
a significant gap, as formal knowledge representations have the potential to alleviate several critical
challenges facing cybersecurity professionals, including (i) the global shortage of skilled cybersecurity
personnel12, (ii) a steep learning curve [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], (iii) the overwhelming volume of data leading to analyst
fatigue and overload34, and (iv) the rapidly evolving threat landscape requiring constant vigilance and
adaptability5.
      </p>
      <p>
        With the advent of Large Language Models (LLMs) such as ChatGPT, LLama, Claude, and Gemini [
        <xref ref-type="bibr" rid="ref10 ref9">9,
10</xref>
        ], researchers have started to explore how these models can improve cybersecurity tasks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However,
these models face serious limitations. They often struggle with complex reasoning tasks and maintaining
consistency across multiple queries, making it dificult to rely on them for complicated cybersecurity
challenges [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In addition, they often fail when dealing with factual data sets, sometimes introducing
irrelevant or incorrect information — a problem commonly referred to as “hallucination", where the
model generates content that is not supported by evidence [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This raises significant trust issues, as
ensuring the quality and reliability of information generated by LLMs remains a formidable challenge,
especially in a domain as critical as cybersecurity.
      </p>
      <p>
        To address these issues in the cybersecurity context, we investigate the integration of domain-specific
knowledge graphs with LLMs. Our goal is to support security analysts in tasks such as log analysis
and attack detection. Among the various strategies for integrating knowledge into LLMs, including
ifne-tuning, prompt engineering, and knowledge editing, we focus on Retrieval-Augmented Generation
(RAG) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This technique enables LLMs to answer user queries using data sets without the need for
additional model training. RAG was originally designed for scenarios where answers are embedded in
specific regions of text. However, it often struggles to synthesize insights from disparate but related
pieces of information. Such volumes of text can exceed the limits of LLM context windows, and
the expansion of such windows may not be suficient, as information can be “lost in the middle” of
longer contexts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Furthermore, RAG encounters dificulties when tasked with understanding and
summarizing semantic concepts across large data sets or voluminous documents (e.g., logs), as is often
the case in cybersecurity. These challenges underscore the need for formal knowledge representations,
which are well suited for defining structured concepts, their relationships, and associated metadata.
      </p>
      <p>In this paper, we present CyKG-RAG, a RAG approach that integrates cybersecurity knowledge
represented as graphs. Our framework leverages the rich semantic relationships and structured data
within KGs to provide contextually relevant information. We validate our approach through real-world
use cases.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Two strands of research are of particular interest in the context of this paper: (i) modeling of knowledge
in the cybersecurity domain, and (ii) graph-informed RAG approaches.</p>
      <p>
        Knowledge Graphs/Ontologies in Cybersecurity Formal knowledge representations have been
topic of extensive research in the cybersecurity domain. Early research in this field focused on high-level
conceptualization of information security knowledge in ontologies (e.g., [
        <xref ref-type="bibr" rid="ref1 ref16 ref17 ref18">16, 1, 17, 18, 19, 20, 21</xref>
        ]). These
ontologies typically comprise a set of fundamental concepts, such as asset, threat, vulnerability, and
countermeasure. In addition, many ontologies specialized for particular application domains (e.g., risk
management, incident management) have been released. A comprehensive survey and classification of
security ontologies can be found in [22].
      </p>
      <p>
        More recently, a number of initiatives have aimed at developing security ontologies that cover
information sharing standards such as CVE, CVSS, and CAPEC. These standards serve as a vital source
for contemporary cybersecurity knowledge and enable exchange of vulnerability information. As part
of a research project (STUCCO), [23] presents an approach for a cybersecurity knowledge graph that
integrates information from both structured and unstructured data sources. The objective in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is
to integrate heterogeneous knowledge schemas from various cybersecurity systems and standards
3https://www.forbes.com/sites/edwardsegal/2021/11/08/alert-fatigue-can-lead-to-missed-cyber-threats-and-stafretentionrecruitment-issues-study/
4https://www.openaccessgovernment.org/fighting-alert-fatigue-and-building-resilient-cybersecurity-strategies/139904
5https://www.weforum.org/agenda/2024/01/cybersecurity-ai-frontline-artificial-intelligence/
and to create a Unified Cybersecurity Ontology (UCO) that aligns CAPEC, CVE, CWE, STIX, Trusted
Automated eXchange for Indicator Information (TAXII), and Att&amp;ck. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] took a more comprehensive
approach, presenting not only a cybersecurity ontology but also addressing the dynamic nature of
the cybersecurity domain and introducing an ETL workflow that updates the knowledge graph as
new information becomes available. Moreover, it can be easily linked to locally available data and
incorporated into operational scenarios. Open-access Cybersecurity Knowledge Graphs (CSKGs) include
CSKG [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Open-CyKG [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], ATT&amp;CK-KG [24] and the SEPSES-CSKG [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. For further details on CSKGs,
we refer to recent surveys [25, 26, 27, 28].
      </p>
      <p>LLM and Retrieval Augmented Generation In Retrieval-Augmented Generation (RAG), relevant
information is retrieved from external sources and incorporated into the context window of large
language models (LLMs). The framework presented in [29] outlines three distinct RAG paradigms:
Naïve RAG, Advanced RAG, and Modular RAG. Naïve RAG approaches are typically based on the
conversion of documents into text, the splitting of text into chunks, and the embedding of these chunks
into a vector space. Advanced RAG systems have undergone further development to incorporate
pre-retrieval and post-retrieval strategies to enhance retrieval quality. Modular RAG systems introduce
iterative and dynamic cycles of interleaved retrieval and generation, thereby enhancing the flexibility
and efectiveness of the retrieval process.</p>
      <p>The intersection of graphs and LLMs in RAG represents an emerging research area with significant
contributions in knowledge graph creation [30], completion [31], and causal graph extraction [32, 33].
Advanced RAG systems have also begun to use graph-based indexing [34]. [35] present the G-Retriever
method, which enables users to ask questions about a graph via a conversational interface. The method
can be fine-tuned to enhance graph understanding via soft prompting. Another approach to add graph
reasoning abilities to existing LLMs is Graph-ToolFormer [36], which aims to enable graph learning.
Furthermore, SURGE [37] supports context-relevant and knowledge-grounded dialogues with a KG.
In comparison to other knowledge-grounded methods, it employs contrastive learning to guarantee
that the generated texts exhibit a high degree of similarity to the retrieved subgraphs. An important
approach for our work is FABULA [38], a system to automatically generate intelligence reports for
events utilizing contextual narrative features found in OSINT. GraphRAG [39] addresses the challenge
that RAG fails on global questions directed at an entire text corpus and hence, introduces an approach
that builds a graph-based index from text corpora, and then partitions the graph into hierarchical
community structures. For query-focused summarization of an entire corpus, a map-reduce approach is
utilized, which first answers the query independently based on each community summary, and then
summarizes all relevant partial answers into a final global answer. In the domain of question answering
and multi-document question answering (MD-QA) [40], [40] use Knowledge Graph Prompting (KGP)
to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction
module and a graph traversal module.</p>
    </sec>
    <sec id="sec-3">
      <title>3. CyKG-RAG Framework</title>
      <p>The objective of this research is to enhance existing LLMs with a domain-specific RAG approach that
leverages symbolic representations to address cybersecurity issues pertaining to private and dynamic
security-related information such as log data, security events and IT infrastructure. Additionally, we seek
to utilize existing cybersecurity information including information about vulnerabilities, weaknesses
and attack patterns to assist cybersecurity analysts in their daily tasks.</p>
      <p>In this section, we present our approach for a RAG system for cybersecurity. We discuss the RAG
architecture including KG construction, vector embedding of an existing cybersecurity KG, and the
retrieval mechanism. Finally, we present our prototype implementation.</p>
      <sec id="sec-3-1">
        <title>3.1. The RAG Architecture</title>
        <p>The architecture illustrated in Figure 1 gives an overview of our RAG system that is specifically
tailored for cybersecurity and leverages knowledge graphs. It can be broken down into the following
components:
(i) Knowledge Graph Construction Our CyKG-RAG approach leverages knowledge graph
representations for domain-specific knowledge related to cybersecurity, including e.g., log events from hosts,
network and infrastructure descriptions, as well as cybersecurity information. This information is
essential for security analysts in answering cybersecurity-related questions, and hence, will be integrated
into our RAG system.</p>
        <p>As depicted on Figure 1, the KG is constructed upon a dual type source of information i.e., (i)
the Log &amp; Event Graphs are constructed from local log data and events, which represent private
organization-specific information related to cybersecurity. Furthermore, (ii) external data from publicly
available cybersecurity information, such as Common Vulnerabilities and Exposures (CVE), Common
Weakness Enumeration (CWE), Common Attack Pattern Enumeration and Classification (CAPEC),
MITRE ATT&amp;CK, and other threat intelligence are included. This information is continuously updated
to ensure that the KG reflects the most recent landscape in cybersecurity threats and defenses. This
constructed KG can then be used to contextualize local security events with external cyber threat
intelligence to provide a more comprehensive understanding of potential threats.</p>
        <p>The construction of KG can be done in two ways: (i) LLM-Based KG Construction. This approach
transforms semi-structured and unstructured information into a KG representation by means of LLMs.
(ii) Rule-Based KG Construction. This approach transforms structured information into a KG by using a
declarative language, e.g., RML that maps the data based on an ontology/schema and transform them
into a KG.
(ii) Vector Embedding in the Existing Cybersecurity KG The advantage of incorporating vector
embeddings into a KG lies in their ability to enable flexible full-text search within unstructured elements
of the KG, particularly for security-related data that is typically unstructured (e.g., log message, security
events, etc). Vector embeddings in KG nodes helps to identify relevant nodes and their relationships
to other entities. This capability is particularly helpful compared to traditional graph queries (e.g.,
SPARQL, Cypher) which may struggle to retrieve information due to mismatched and incomplete data
within the KG.</p>
        <p>The goal of vector similarity search is to locate elements (i.e., nodes) in the KG based on the similarity
of their vector embeddings. These embeddings are encoded and stored as properties within the nodes.
Typically, a node represents a chunk of data that is part of a larger document. In a cybersecurity context,
a document could be a security log source, and a chunk might be a log line containing a message about
particular events in the system.</p>
        <p>Figure 2 shows an example of similarity search using vector embeddings in log KG. As shown in
the top-left section of the figure, the process starts with KG construction that transforms entries (e.g.,
"Mar 4 19:25:47 mail auth: pam_unix(dovecot): authentication failure; logname...") from a log source (e.g.,
Authlog) into a log graph. The log graph is then fragmented into smaller units called "Chunks" (e.g.,
Chunk-1, Chunk-2, Chunk-3, etc.). These chunks are nodes in the KG and represent individual log lines
associated with specific nodes that capture relevant entities (such as user "Daryl" and the "Davecot"
mailserver) and their relationships (e.g., mentions, authentication failure). Each node in the KG, which
represents a chunk of log data, is associated with an embedding – i.e., a numerical representation that
captures the semantic meaning of the text within the chunk. These embeddings are then stored in a
Vector Index. The embeddings allow the system to perform a vector similarity search to retrieve the most
contextually relevant chunks in response to a query. For instance, the node property section shows
an embedding corresponding to Chunk-2, which is linked to an authentication failure event involving
Daryl. The chunk-wise embedding enables the system to match the query against the most relevant log
lines based on their semantic similarity rather than just keyword matching.
(iii) Retrieval and Question Answering Mechanism
nism of our RAG approach consists of the following steps:</p>
        <sec id="sec-3-1-1">
          <title>As depicted in Figure 1, the retrieval mecha</title>
          <p>• Security Analyst’s Question ○ 1 The process starts with a Security Analyst that asks a
cybersecurity question in natural language. This question is integrated into a prompt and sent to the
LLM.
• Graph Query &amp; Vector (Semantic) Search ○ 2 Based on the given question and contextual
knowledge (i.e, KG schema, vocabulary), the LLM then generates a graph query — i.e., either a
SPARQL query for Resource Description Framework (RDF) KGs or a Cypher query for Labelled
Property Graphs (LPGs) — that is executed against the KG. Note that the generated queries may
not always answer the given question – e.g., because the LLM may produce an invalid query or
the query may yield an empty result. To address this limitation, our framework combines query
generation with vector (semantic) Search based on similarities between the vector representation
of the question and the vector representation of the embeddings in the KG. As depicted in
Figure 2, the question "Who is Daryl? What did he do on the System?" is first encoded into a query
embedding - a vector that represents the semantic meaning of the questions - via an LLM. This
query embedding is then compared against the Vector Index containing the embeddings of log
data chunks. Through vector similarity search, the system identifies the most relevant chunks
and their associate nodes in the KG. The identified nodes can then be used as a starting point
to traverse other paths (e.g., via graph queries) and retrieve other information relevant to the
question.
• Cybersecurity Knowledge and Log Event Graphs ○ 3 The generated queries target two primary
data sources:
– Log &amp; Event Graphs: These graphs represent private/local-information constructed from
cybersecurity-related data such as logs, events and infrastructure information.
– Cybersecurity Knowledge Graphs: These graphs represent external-information that contain
broader cybersecurity knowledge constructed from publicly available data sources – such
as vulnerabilities, weaknesses, attack patterns, etc.
• Results Retrieval ○ 4 The results retrieved (i) from the Log &amp; Event and Cybersecurity Knowledge
Graphs based on a Graph query and (ii) via Vector (similarity) search are passed to the LLM
together with the prompt ○ 5 .
• Synthesizing Results ○ 6 The LLM processes and synthesizes the retrieved information to
generate a final answer that is presented to the Security Analyst.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Prototypical Implementation</title>
        <p>
          (i) KG Construction Our approach to KG construction is based on two fundamental techniques. The
ifrst is rule-based construction, which we leverage to ingest structured cybersecurity information. For
this purpose, we reuse the existing CSKG[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] which employs RDF Mapping Language (RML) to map
and transform structured information into a KG. We used Ontotext GraphDB6, an Open Source graph
database to store the RDF data.
        </p>
        <p>Second, we utilize LLM-based construction to transform semi-structured and unstructured information
from private or local data (e.g., log sources) into a KG. In particular, we use Langchain LLM Graph
Transformers7, a Python based tool that converts documents into graph-based formats using an LLM. In
our experiments, we used ChatGPT 3.5-turbo8 to perform these tasks.</p>
        <p>To create vector embeddings for the constructed KGs, we leverage OpenAIEmbedding9, a service
provided by OpenAI that generates vector embeddings for text data. The generated vector embeddings
are then stored in a Neo4J vector store10.
(ii) Simple Chatbot Interface for Q&amp;A To facilitate the interaction of security analysts with the
CyKG-RAG system, we developed a simple web-based chatbot system interface as shown in Figure 3.
We used Streamlit 11, an open-source Python framework to rapidly build web apps. Within the resulting
app, security analysts can ask security-relevant questions in natural language and get the response back
together with an explanation (e.g., the actual query and its results from the KG).</p>
        <p>The interface consists of several elements, (i) a Question Area where the security analyst can input
queries, (ii) a Final Answers Box that displays the answers generated by the system, (iii) a Generated
Graph Query section that shows the actual query created by the LLM, which could be either a SPARQL
or Cypher query, and (iv) a Query Result section that provides the data passed to the LLM, including
results obtained from the graph database or vector embeddings.
6https://www.ontotext.com/products/graphdb/?ref=menu
7https://python.langchain.com/v0.1/docs/use_cases/graph/constructing/
8https://platform.openai.com
9https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/
10https://neo4j.io/
11https://streamlit.io/</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment and Use Case Application</title>
      <p>In this section, we describe a series of experiments that we conducted to demonstrate the feasibility of
our RAG approach for cybersecurity-related question-answering and analysis tasks. We started with
general cybersecurity questions related to cyber threat intelligence and vulnerability assessment using
continuously updated CVE information, followed by log analysis of a specific log dataset. Diferent
colors are used to distinguish the part of our retrieval process: (i) the blue text box represents the natural
question asked by the security analyst, (ii) the green box represents the LLM’s generated graph query
(e.g., SPARQL, Cypher), (iii) the red box represents the query results obtained from the graph database
or retrieval results from vector similarity, and (iv) the brown box represents the final answer to the
question.
(i) General Questions on Cyber threat Intelligence In this demonstration, we leverage our
existing SEPSES CSKG12 as a knowledge source for our RAG system. In this case, the system can be
used to answer questions related to cyber threat intelligence (e.g., MITRE ATT&amp;CK13) including queries
about attack techniques, tactics, etc.</p>
      <sec id="sec-4-1">
        <title>Question</title>
        <p>Which MITRE ATT&amp;CK techniques are used by attackers to escalate their privileges within a network?</p>
      </sec>
      <sec id="sec-4-2">
        <title>Generated SPARQL Query</title>
        <p>PREFIX dcterms: &lt;http://purl.org/dc/terms/&gt;
PREFIX attack: &lt;http://w3id.org/sepses/vocab/ref/attack#&gt;
SELECT ?technique ?title WHERE {
?technique a attack:Technique .
?technique dcterms:description ?description .</p>
        <p>FILTER(CONTAINS(?description, "escalate")</p>
        <p>&amp;&amp; CONTAINS(?description, "privileges"))
?technique dcterms:title ?title .</p>
        <p>}
12https://w3id.org/sepses/
13https://attack.mitre.org/</p>
        <p>Query Result passed to the LLM</p>
        <p>technique title
1 res:attack/technique/boot-or-logon-initialization-scripts "Boot or Logon Initialization Scripts"
2 res:attack/technique/token-impersonation-theft "Token Impersonation/Theft"
3 res:attack/technique/windows-service "Windows Service"
4 ... "..."
(ii) Vulnerability Assessment over Continuously Updated Vulnerability Information With
the same knowledge source we used in use case (i), we demonstrate how our RAG system can efectively
handle the dynamic retrieval of rapidly evolving cybersecurity information. For instance, the system
can assess the current vulnerability of a specific component or product identified by its CVE ID. This
would not be possible by an LLM alone, since it would require frequent updates (e.g., fine-tuning)
whenever CVEs are released.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Question</title>
        <p>What is the CVE identifier and its description that contains the vulnerability of C-MORE EA9 HMI? Mention three of
them!</p>
      </sec>
      <sec id="sec-4-4">
        <title>Generated SPARQL Query</title>
        <p>PREFIX dcterms: &lt;http://purl.org/dc/terms/&gt;
SELECT ?identifier ?description
WHERE {
?cve a &lt;http://w3id.org/sepses/vocab/ref/cve#CVE&gt; ;
dcterms:identifier ?identifier ;
dcterms:description ?description .</p>
        <p>FILTER regex(?description, "C-MORE EA9 HMI", "i")
} LIMIT 3
Query result passed to the LLM</p>
        <p>identifier
1 "CVE-2024-25138"
2 "CVE-2024-25137"
3 "CVE-2024-25136"
description
" In AutomationDirect C-MORE EA9 HMI, credentials used by the platform .. "
"In AutomationDirect C-MORE EA9 HMI there is a program that copies a .."
"There is a function in AutomationDirect C-MORE EA9 HMI that allows an .."</p>
      </sec>
      <sec id="sec-4-5">
        <title>Final Answer</title>
        <p>Three CVE identifiers related to vulnerabilities in AutomationDirect C-MORE EA9 HMI are CVE-2024-25138,
CVE2024-25137, and CVE-2024-25136.</p>
        <p>CVE-2024-25138: In AutomationDirect C-MORE EA9 HMI, credentials used by the platform are stored as plain text
on the device.</p>
        <p>CVE-2024-25137: In AutomationDirect C-MORE EA9 HMI, there is a program that copies a bufer of a size controlled
by the user into a limited sized bufer..</p>
        <p>CVE-2024-25136: There is a function in AutomationDirect C-MORE EA9 HMI that allows an attacker to send a
relative path in ..
(iii) Security Log Analysis This scenario demonstrates the use of our RAG system to facilitate
question answering against private or local information (e.g., log sources). For this experiment, we used
a dataset derived from the AIT dataset[41] and constructed a KG from it using an LLM. Additionally,
we generated vector embeddings to enable full-text search against the constructed KG.</p>
      </sec>
      <sec id="sec-4-6">
        <title>Graph and vector data passed to the LLM</title>
        <p>Graph data:
Dovecot - USES -&gt; Daryl
Dovecot - AUTHENTICATES -&gt; Daryl
Dovecot - AUTHENTICATION_FAILURE -&gt; Daryl
Dovecot - AUTHORIZES -&gt; Daryl
Dovecot - FAILED_AUTHENTICATION -&gt; Daryl
Vector data:
text: Mar 4 19:25:49 mail auth: pam_unix(dovecot:auth): authentication failure;..
text: Mar 4 19:25:48 mail auth: pam_unix(dovecot:auth): authentication failure;..
text: Mar 4 19:28:35 mail auth: pam_unix(dovecot:auth): authentication failure;..
text: Mar 4 19:25:54 mail auth: pam_unix(dovecot:auth): authentication failure;..
Daryl is a user on the system who attempted to authenticate multiple times but failed each time. He tried to log in at
19:25:49, 19:25:48, 19:28:35, and 19:25:54.’</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>Cybersecurity is a domain characterized by complex and fast-evolving domain knowledge as well as
highly dynamic environments and information flows in various forms (such as logs). Furthermore,
the relational nature of such information requires means to connect chunks of disparate information
efectively and reliably. These characteristics pose a challenge for LLMs, which have shown significant
potential in the cybersecurity domain, but cannot easily be retrained on updated information, have
dificulties operating on factual information reliably, and are prone to efects like hallucination.</p>
      <p>In this paper, we proposed to combine multiple retrieval-augmented generation techniques to tackle
such inherent limitations and leverage LLMs in cybersecurity analysis workflows. To this end, we
introduced CyKG-RAG, a RAG-based question-answering system that combines symbolic representation of
security knowledge in (knowledge) graph structures with query generation and vector embedding-based
semantic similarity search to retrieve relevant information in real-time and synthesize meaningful
answers. Our preliminary evaluation of CyKG-RAG in example use cases ranging from general
cybersecurity knowledge to answering questions on dynamic knowledge such as emerging vulnerabilities and
on real-time log data show highly promising results. In our future work, we will evaluate the approach
more comprehensively in a broader set of experimental settings and investigate the relative efectiveness
of various RAG techniques and diferent approaches to combine them in diferent scenarios.
Acknowledgements. This work has been partially supported and funded by the Austrian Research
Promotion Agency (FFG) via the Austrian Competence Center for Digital Production (CDP) under the
contract number 881843. SBA Research (SBA-K1) is a COMET Centre within the COMET – Competence
Centers for Excellent Technologies Programme and funded by BMK, BMAW, and the federal state of
Vienna. COMET is managed by FFG. This work is part of the TEAMING.AI project which receives
funding in the European Commission’s Horizon 2020 Research Programme under Grant Agreement
Number 957402 (www.teamingai-project.eu).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.
[19] V. Raskin, C. Hempelmann, K. Triezenberg, S. Nirenburg, Ontology in information security: A useful
theoretical foundation and methodological tool, in: Proceedings of the 2001 Workshop on New Security
Paradigms, 2001. doi:10.1145/508171.508180.
[20] M. Schumacher, Toward a Security Core Ontology, 2003. doi:10.1007/978-3-540-45180-8\_6.
[21] J. Undercofer, A. Joshi, J. Pinkston, Modeling computer attacks: An ontology for intrusion detection, in:</p>
      <p>Recent Advances in Intrusion Detection, 2003.
[22] A. Souag, C. Salinesi, I. Comyn-Wattiau, Ontologies for security requirements: A literature survey and
classification, in: Advanced Information Systems Engineering Workshops, 2012.
[23] M. Iannacone, S. Bohn, G. Nakamura, J. Gerth, K. Hufer, R. Bridges, E. Ferragut, J. Goodall, Developing an
ontology for cyber security knowledge graphs, 2015. doi:10.1145/2746266.2746278.
[24] K. Kurniawan, A. Ekelhart, E. Kiesling, An att&amp;ck-kg for linking cybersecurity attacks to adversary tactics and
techniques, in: The Semantic Web – ISWC 2021, 2021, p. 5. URL: https://ceur-ws.org/Vol-2980/paper363.pdf.
[25] K. Liu, F. Wang, Z. Ding, S. Liang, Z. Yu, Y. Zhou, Recent Progress of Using Knowledge Graph for</p>
      <p>Cybersecurity, Electronics 11 (2022) 2287. doi:10.3390/electronics11152287.
[26] K. Liu, F. Wang, Z. Ding, S. Liang, Z. Yu, Y. Zhou, A review of knowledge graph application scenarios in
cyber security, 2022. URL: http://arxiv.org/abs/2204.04769, arXiv:2204.04769 [cs].
[27] L. F. Sikos, Cybersecurity knowledge graphs, Knowledge and Information Systems 65 (2023) 3511–3531.</p>
      <p>doi:10.1007/s10115-023-01860-3.
[28] X. Zhao, R. Jiang, Y. Han, A. Li, Z. Peng, A survey on cybersecurity knowledge graph construction,</p>
      <p>Computers &amp; Security 136 (2024) 103524. doi:10.1016/j.cose.2023.103524.
[29] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, H. Wang, Retrieval-augmented generation
for large language models: A survey, 2024. URL: https://arxiv.org/abs/2312.10997. arXiv:2312.10997.
[30] M. Trajanoska, R. Stojanov, D. Trajanov, Enhancing knowledge graph construction using large language
models, 2023. URL: https://arxiv.org/abs/2305.04676. arXiv:2305.04676.
[31] L. Yao, J. Peng, C. Mao, Y. Luo, Exploring large language models for knowledge graph completion, 2024.</p>
      <p>URL: https://arxiv.org/abs/2308.13916. arXiv:2308.13916.
[32] T. Ban, L. Chen, X. Wang, H. Chen, From query tools to causal architects: Harnessing large language models
for advanced causal discovery from data, 2023. URL: https://arxiv.org/abs/2306.16902. arXiv:2306.16902.
[33] Y. Zhang, Y. Zhang, Y. Gan, L. Yao, C. Wang, Causal graph discovery with retrieval-augmented generation
based large language models, 2024. URL: https://arxiv.org/abs/2402.15301. arXiv:2402.15301.
[34] J. Baek, A. F. Aji, A. Safari, Knowledge-augmented language model prompting for zero-shot knowledge
graph question answering, 2023. URL: https://arxiv.org/abs/2306.04136. arXiv:2306.04136.
[35] X. He, Y. Tian, Y. Sun, N. V. Chawla, T. Laurent, Y. LeCun, X. Bresson, B. Hooi, G-retriever:
Retrievalaugmented generation for textual graph understanding and question answering, 2024. URL: https://arxiv.
org/abs/2402.07630. arXiv:2402.07630.
[36] J. Zhang, Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by
chatgpt, 2023. URL: https://arxiv.org/abs/2304.11116. arXiv:2304.11116.
[37] M. Kang, J. M. Kwak, J. Baek, S. J. Hwang, Knowledge graph-augmented language models for
knowledgegrounded dialogue generation, 2023. URL: https://arxiv.org/abs/2305.18846. arXiv:2305.18846.
[38] P. Ranade, A. Joshi, Fabula: Intelligence report generation using retrieval-augmented narrative
construction, in: Proceedings of the International Conference on Advances in Social Networks Analysis and
Mining, ASONAM ’23, ACM, 2023. URL: http://dx.doi.org/10.1145/3625007.3627505. doi:10.1145/3625007.
3627505.
[39] D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, J. Larson, From local to global: A graph
rag approach to query-focused summarization, 2024. URL: https://www.microsoft.com/en-us/research/
publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization/.
[40] Y. Wang, N. Lipka, R. A. Rossi, A. Siu, R. Zhang, T. Derr, Knowledge graph prompting for multi-document
question answering, 2023. URL: https://arxiv.org/abs/2308.11730. arXiv:2308.11730.
[41] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner, A. Rauber, Ait log data set v1.1, 2020. URL:
https://doi.org/10.5281/zenodo.4264796. doi:10.5281/zenodo.4264796.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          ,
          <article-title>Formalizing information security knowledge</article-title>
          ,
          <source>in: Proceedings of the 4th International Symposium on Information, Computer, and Communications Security</source>
          ,
          <year>2009</year>
          . doi:
          <volume>10</volume>
          .1145/1533057. 1533084.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kiesling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kurniawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ekaputra</surname>
          </string-name>
          ,
          <article-title>The SEPSES Knowledge Graph: An Integrated Resource for Cybersecurity</article-title>
          , in: C.
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Svátek</surname>
            ,
            <given-names>I. Cruz</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lefrançois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gandon</surname>
          </string-name>
          (Eds.),
          <source>The Semantic Web - ISWC</source>
          <year>2019</year>
          , volume
          <volume>11779</volume>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>214</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -30796-7_13, series Title: Lecture Notes in Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kurniawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          , E. Kiesling,
          <string-name>
            <given-names>G.</given-names>
            <surname>Quirchmayr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Tjoa</surname>
          </string-name>
          ,
          <article-title>Krystal: Knowledge graph-based framework for tactical attack discovery in audit data</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>121</volume>
          (
          <year>2022</year>
          )
          <article-title>102828</article-title>
          . doi:
          <volume>10</volume>
          . 1016/j.cose.
          <year>2022</year>
          .
          <volume>102828</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] HoloLen, Cybersecurity Knowledge Graph,
          <year>2020</year>
          . URL: https://github.com/HoloLen/CyberSecurity_ Knowledge_graph.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sarhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spruit</surname>
          </string-name>
          , Open-cykg:
          <article-title>An open cyber threat intelligence knowledge graph, Knowledge-Based Systems 233 (</article-title>
          <year>2021</year>
          )
          <article-title>107524</article-title>
          . doi:https://doi.org/10.1016/j.knosys.
          <year>2021</year>
          .
          <volume>107524</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Syed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Padia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mathews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>UCO: A Unified Cybersecurity Ontology</article-title>
          ,
          <source>in: Proceedings of the AAAI Workshop on Artificial Intelligence for Cyber Security</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Ekaputra</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Kiesling,</surname>
          </string-name>
          <article-title>The slogert framework for automated log knowledge graph construction</article-title>
          , in: R.
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>P.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Champin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ristoski</surname>
          </string-name>
          , M. Alam (Eds.),
          <source>The Semantic Web</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>631</fpage>
          -
          <lpage>646</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Baillie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-W.</given-names>
            <surname>Loidl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Chatzifoti</surname>
          </string-name>
          ,
          <article-title>Introducing and interfacing with cybersecurity - a cards approach</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2307.16535. arXiv:
          <volume>2307</volume>
          .
          <fpage>16535</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Minaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chenaghlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Amatriain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <source>Large language models: A survey</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2402.06196. arXiv:
          <volume>2402</volume>
          .
          <fpage>06196</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>A survey of large language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2303.18223. arXiv:
          <volume>2303</volume>
          .
          <fpage>18223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly, High-Confidence Computing 4 (</article-title>
          <year>2024</year>
          )
          <article-title>100211</article-title>
          . URL: https: //www.sciencedirect.com/science/article/pii/S266729522400014X. doi:
          <volume>10</volume>
          .1016/j.hcc.
          <year>2024</year>
          .
          <volume>100211</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Self-consistency improves chain of thought reasoning in language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2203.11171. arXiv:
          <volume>2203</volume>
          .
          <fpage>11171</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          , W. Ma,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , T. Liu,
          <article-title>A survey on hallucination in large language models: Principles, taxonomy</article-title>
          , challenges, and open questions,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2311.05232. arXiv:
          <volume>2311</volume>
          .
          <fpage>05232</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W. tau Yih, T. Rocktäschel,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/
          <year>2005</year>
          .11401. arXiv:
          <year>2005</year>
          .11401.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paranjape</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bevilacqua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Lost in the middle: How language models use long contexts</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2307.03172. arXiv:
          <volume>2307</volume>
          .
          <fpage>03172</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fenz</surname>
          </string-name>
          , T. Neubauer,
          <article-title>Aurum: A framework for information security risk management</article-title>
          ,
          <source>in: Proceedings of the 42nd Hawaii International Conference on System Sciences</source>
          ,
          <year>2009</year>
          . doi:
          <volume>10</volume>
          .1109/HICSS.
          <year>2009</year>
          .
          <volume>82</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Security ontology for annotating resources, in: On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and</article-title>
          <string-name>
            <surname>ODBASE</surname>
          </string-name>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Martimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. d. S.</given-names>
            <surname>Moreira</surname>
          </string-name>
          ,
          <article-title>An owl-based security incident ontology</article-title>
          , in: Proceedings of the Eighth International Protege Conference,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>