<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AISecKG: Knowledge Graph Dataset for Cybersecurity Education</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Garima Agrawal</string-name>
          <email>garima.agrawal@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuntal Pal</string-name>
          <email>kkpal@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuli Deng</string-name>
          <email>ydeng19@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huan Liu</string-name>
          <email>huanliu@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chitta Baral</string-name>
          <email>chitta@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing and Augmented Intelligence, Arizona State University</institution>
          ,
          <addr-line>Tempe, AZ</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Cybersecurity education is exceptionally challenging as it involves learning the complex attacks; tools and developing critical problem-solving skills to defend the systems. For a student or novice researcher in the cybersecurity domain, there is a need to design an adaptive learning strategy that can break complex tasks and concepts into simple representations. An AI-enabled automated cybersecurity education system can improve cognitive engagement and active learning. Knowledge graphs (KG) provide a visual representation in a graph that can reason and interpret from the underlying data, making them suitable for use in education and interactive learning. However, there are no publicly available datasets for the cybersecurity education domain to build such systems. The data is present as unstructured educational course material, Wiki pages, capture the flag (CTF) writeups, etc. Creating knowledge graphs from unstructured text is challenging without an ontology or annotated dataset. However, data annotation for cybersecurity needs domain experts. To address these gaps, we made three contributions in this paper. First, we propose an ontology for the cybersecurity education domain for students and novice learners. Second, we develop AISecKG, a triple dataset with cybersecurity-related entities and relations as defined by the ontology. This dataset can be used to construct knowledge graphs to teach cybersecurity and promote cognitive learning. It can also be used to build downstream applications like recommendation systems or self-learning question-answering systems for students. The dataset would also help identify malicious named entities and their probable impact. Third, using this dataset, we show a downstream application to extract custom-named entities from texts and educational material on cybersecurity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph</kwd>
        <kwd>Cybersecurity Education</kwd>
        <kwd>Ontology</kwd>
        <kwd>Knowledge Base</kwd>
        <kwd>KG Dataset</kwd>
        <kwd>Language Model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Learning cybersecurity requires mastering the academic content and developing critical thinking
and problem-solving skills based on cyber attacks and defense scenarios. We can achieve this
interactive and active learning by creating an AI-powered education system where students
can control their learning process [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Knowledge graphs have been efectively used in
education and improving the learning experience [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        A knowledge graph combines two things, a graph with domain-specific data and an explicit
representation of knowledge. The graph can capture the domain-related key concepts and their
interactions with each other. It allows the user to analyze and understand the connections or
relationships between diferent entities. Using explicit knowledge or metadata provides the
relevant background and important information about the domain. This metadata allows the
system to establish a common vocabulary and use shared references. The knowledge graphs are
thus an integrated tool that can use the underlying data and knowledge for concept visualization
and contextual reasoning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Their ability to translate data into usable knowledge makes them
suitable for education. They can promote cognitive engagement in the problem-based learning
environment.
      </p>
      <p>However, to build such systems, there is a need for annotated datasets. There are no public
datasets in the cybersecurity education domain. The education material includes unstructured
texts in lecture notes, lab manuals, Wiki pages, capture the flag (CTF) writeups, and others.
Scraping unstructured text and creating domain-specific knowledge graphs is challenging,
especially without standard ontology and annotated datasets. The task of annotating data
is expensive and time-consuming as it can be done only by cybersecurity domain experts
accurately. The increase in demand for cybersecurity professionals requires preparing an
efective cybersecurity specialist workforce and equipping them with intelligent learning tools.
A comprehensive dataset with cybersecurity-related named entities is a significant bottleneck
in this area.</p>
      <p>In this paper, we address this issue by making three main contributions. First, using domain
knowledge, we propose an ontology for self-paced cybersecurity learning for novice users.
Second, we create an annotated named entity dataset. Using this dataset, we show one downstream
application to extract named entities from texts and educational material on cybersecurity. Third,
we present a triple dataset AISecKG, for cybersecurity education as defined by our ontology.
Using the triples data, we show one downstream task to construct a concept flow graph for a
cybersecurity tool. It is possible to write graph queries to generate sub-graphs focusing on the
specific learning needs of a user. Also, by combining the AISecKG ontology schema and the
cybersecurity named entities, knowledge graphs can be created from any unstructured texts
on cybersecurity. Our ontology and labeled dataset can also be used to build applications like
question-answering and recommendation systems for students.</p>
      <p>The paper is organized as follows. In the next section, we discuss the related work. Section 3
describes the method and ontology. Section 4 presents our work’s results and two applications.
Finally, we conclude the paper in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Security plays an integral role in software development and has become more critical with the
Internet of Things (IoT). Various studies to formalize security and develop security ontologies
and knowledge models address diferent security aspects. Souag et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] gave a security
ontology to elicit security requirements. The Unified Cybersecurity Ontology (UCO) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] focused
on identifying the vulnerabilities and threat levels to assess the system security. Doynikova [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
proposed an ontology on security metrics for cybersecurity assessment to determine the attack
goal. An extensive study on formalizing information security focuses on security concepts, and
threat mitigation and control process [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A cybersecurity ontology was proposed to build and
monitor the security in the cloud for IoT environment [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Iannacone et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] developed an
ontology for managing cybersecurity knowledge database from diferent data sources to propose
a search mechanism for blacklisted systems. MALOnt [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] gives the ontology and knowledge
graphs for malware threat intelligence. Martins et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] presented a conceptual characterization
of available cybersecurity ontologies based on their application. In this work, we introduce
an ontology AISecKG which covers a broader spectrum of the fundamental concepts, tools,
techniques, and applications used in the cybersecurity ecosystem. Essentially this ontology is
helpful for any first-time user or new learner in the cybersecurity domain.
      </p>
      <p>
        Many datasets have also been proposed in the cybersecurity domain, but most are based
on network flow data and are used to train machine learning algorithms to build intrusion
detection systems. Alshaibi et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] gave a comparative study of these datasets. Recently
more eforts have been made to create cybersecurity named-entity datasets. A new dataset for
event detection in cybersecurity texts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] annotated 30 types of critical events in cybersecurity
to train the machine learning models. A named-entity recognition (NER) python library called
CyNer [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] based on MALOnt ontology [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] was developed to extract the malware and threat
indicators. Language models were also built for cybersecurity [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17, 18, 19</xref>
        ] using the open source
CVE [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and NVD Mitre [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] datasets on vulnerability and attacks. Dasgupta et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] gave a
comparative study on NER algorithms based on these datasets for cybersecurity. In our current
work, we develop a labeled named-entity dataset for cybersecurity based on AISecKG ontology
which is used to build knowledge graphs from any unstructured text on cybersecurity.
      </p>
      <p>Most of the available cybersecurity ontology and datasets are used to build intrusion detection
systems or perform vulnerability analysis and threat detection. Table 1 compares our ontology
and cybersecurity dataset with existing works. There is limited research to educate novice
learners on cybersecurity concepts, tools, and techniques. Deng et al. [23] proposed using
knowledge graphs as lab project guidance to teach cybersecurity. They focused on finding similar
concepts on the web using similarity measures [24] and word embeddings [25]. In our paper
[26], we proposed a semi-automated approach to build knowledge graphs from the unstructured
cybersecurity course material and conducted a survey and interview with students to assess the
perception of students on using knowledge graphs as a problem-solving education tool aid. The
students found the knowledge graphs very useful, which motivated us to propose AISecKG, a
comprehensive ontology and a labeled dataset that can be used to build AI systems to learn
about cybersecurity. In this work, we give a detailed ontology to understand the cybersecurity
ecosystem from diferent views and present an annotated named-entity recognition dataset to
extract cybersecurity-related entities.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method for Development of AISecKG</title>
      <p>
        Cybersecurity is the application of state-of-the-art technologies, control processes, policies,
tools, and procedures for protecting or recovering systems and information from malicious
attacks[27]. As a novice learner, one must know the cybersecurity ecosystem comprising
fundamental concepts, tools, and techniques and how to use and deploy them to assess and
Cybersecurity Model Purpose
Souag et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] Security Requirement Elicitation
UCO [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] Vulnerability Assessment
Doynikova et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] Security Metrics
Fenz et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] Threat Mitigation and control
Mozzaquatro et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] IoT Security Monitoring
Iannacone et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] Search cybersecurity knowledge base
Alshaibi et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] Intrusion Detection Models
Tikhomirov et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] Vulnerability/Attack detection
Ma et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] Vulnerability/Attack detection
Gao et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] Vulnerability/Attack detection
Trong et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] Event Detection
MALOnt [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] Malware Threat Intelligence KG
CyNer [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Malware Threat Entity Extraction
AISecKG Cybersecurity education KG
Ontology
✓
✓
✓
✓
✓
✓
X
X
X
X
X
✓
X (using MALOnt)
✓
      </p>
      <p>Dataset</p>
      <p>X
X
X
X
X</p>
      <p>X
Network Flow datasets
Open Source (CVE/NVD)
Open Source (CVE/NVD)
Open Source (CVE/NVD)
✓(Annotated dataset)
✓(Annotated dataset)</p>
      <p>X
✓(Annotated Triples KG Dataset)
detect vulnerabilities and attacks. This section presents our ontology design and dataset for
cybersecurity education called AISecKG.</p>
      <p>We propose a comprehensive view of concepts, applications, and roles involved in the
cybersecurity ecosystem. Since the objective is to build a self-paced learning tool for cybersecurity
students and novice learners, we use the graduate-level course material and hands-on lab
instruction manuals to teach graduate students majoring in cybersecurity as the data source.
Our ontology, AISecKG, is built using domain knowledge and motivated by the lab guides. The
dataset is created by annotating the lab documents using AISecKG ontology. We then use this
annotated dataset to develop two applications. The first application is to train a language model
on our dataset to extract named entities related to cybersecurity. The second application is to
create a triple dataset with entity-relation-entity pairs to construct knowledge graphs. Both these
applications are described in the next section.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Source</title>
        <p>We collected data from the laboratory instruction manuals. The manuals are for projects of
advanced cybersecurity courses for graduate students. These courses cover topics such as
using tools like NMap and Snort [28] to build intrusion detection systems, employing honeypot
techniques in Metasploit framework to deceive attackers, setting up Kali Linux systems, and
monitoring system activities and attack events using Syslog. The manuals are in standard
English and explain the concepts and instructions for implementing laboratory tasks. Each
manual is 15-20 pages long. For annotation, we used six such lab manuals with a total of 100
pages with approximately 26886 words and 110953 characters.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Ontology</title>
        <p>Ontology is a formal and explicit schematic representation of a system using a well-defined
taxonomy. It allows semantic modeling of the domain knowledge and thus can be used as the
skeleton to build any AI application for that system [29]. Ontology also defines the rules and
constraints of the system and facilitates the validation of semantic relationships and conclusions
or inferences from known facts. For a knowledge-based system, ontology serves as a backbone
of the system and should be meticulously developed.</p>
        <p>Some deep learning-based methods rely on automatically building the AI application from the
data without using an ontology, but they fail to capture the comprehensive view of the domain,
and the quality of applications becomes questionable [30]. On the other hand, if a well-defined
structured dataset is unavailable and most of the knowledge is present in unstructured texts, it
is overwhelming for a domain expert to scrape long texts and create a domain-specific ontology.
Also, it is costly to find domain experts in cybersecurity.</p>
        <p>The domain experts should have practiced or significantly demonstrated suficient knowledge
and experience. In this work, the first and second authors are graduate researchers in
cybersecurity, and the third author is a cybersecurity expert and instructor. He teaches graduate-level
cybersecurity courses at his university.</p>
        <p>To develop AISecKG ontology, we used the bottom-up approach given in the paper [26]. We
used the lab documents as a reference and then used domain knowledge to design the ontology.
First, we extracted the generic entities and relations from the lab documents using the parts of
speech tagging and the dependency parsing given by spacy-based named-entity recognition
(NER) [31] natural language processing (NLP) methods. The entities extracted using NER are
the subject-object pairs, and relations are the predicates in the sentences. These
entity-relationentity triples are not specific to cybersecurity, but they help break down the long texts into
simple graph-like structures and create a preliminary visual representation of information. They
serve as a good reference point for domain experts. This step semi-automated the ontology
construction process and significantly reduced time and efort. It helped in discovering the
schematic and semantic relationships of core entities.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Key Entities</title>
          <p>The cybersecurity ecosystem essentially has three foundational pillars, namely, concept,
application, and role. We can classify concepts into features, functions, data, attacks, vulnerabilities,
and techniques. In addition to defensive and attack methods, the techniques here include
security policies and management processes. The application denotes the tools, systems, and
apps. The user, attacker, and securityTeam are the three roles. Thus in our ontology, we defined
these three categories with 12 types of entities.</p>
          <p>Figure 1 depicts the cybersecurity education ecosystem with each category and entity type.
The attributes or metadata considered for the entities are entityID, entityName, entityType, and
entityCategory. The examples from each entity type within the respective category are shown
in Table 3.2.1.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Relations</title>
          <p>We used the nine most common and appropriate relations to represent the real-world interactions
between cybersecurity entities. Table 3 shows the relations along with examples from our
dataset as entity-relation-entity triples.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Cybersecurity Schema</title>
          <p>We now present the schema design for learning cybersecurity. We illustrate the interactions
between diferent components from the perspective of the roles. Figure 2 shows the user’s
view. It depicts how users use the data, applications, and systems routinely. The system and
apps, in turn, use diferent tools for their usual operations and defensive techniques to monitor
and analyze the environment. The icons in the diagram represent the respective entities, and
the labeled edges show the relationship between diferent entities.</p>
          <p>Figure 3 gives the attacker view. When the applications expose vulnerabilities and the
attacker can exploit them using various tools and attack techniques, the attacker and attacks
can harm the data and applications.</p>
          <p>The third view is the security view. Figure 4 shows how the security team uses tools and
Nmap has_a network mapper
Packet Decoder can_analyze header anomaly
Intel CPU can_expose CVE-2017-5754
Attack host can_exploit TCP syn packet
Network administrators implements map
Team defense uses firewall
Attack can_harm target host
Full scan can_detect Trojan horses</p>
          <p>Metasploit Framework is_part_of Kali Linux
defensive techniques to analyze and detect vulnerabilities and attacks.</p>
          <p>The three views shown in Figure 2, 3 and 4 give the landscape of cybersecurity concepts, tools,
techniques, systems, and policies that are required for a novice learner to gain an understanding
of the domain. AISecKG ontology identifies 68 schema edges or interactions among the 12 types
of entities shown in the respective figures.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. AISecKG Dataset Annotation</title>
        <p>Using the AISecKG Ontology, we identified 964 cybersecurity-related unique entities from the
course materials. There are 12 entity types in three categories in the ontology. We labeled the
attributes, entity Id, entity type, and entity category against each entity and created an entity
info list. To train the model to predict custom cybersecurity-related entities, we created the
annotated dataset using BIO (Beginning-Inside-Outside) sequence tagging scheme [32]. The
entity boundary is defined by tags ’B’ and ’I’ called Beginning and Inside the label. All the
words other than entity are labeled as ’O’. The lab documents were first split into sentences
using a simple python script. There were 593 sentences, and 2354 entities were annotated in
these sentences. The code and commands were discarded from the text. The annotation was
done by the first and second authors and was validated by the third author.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Applications of AISecKG</title>
      <sec id="sec-4-1">
        <title>4.1. NLP Language Model to Extract custom Named-Entities</title>
        <p>Here we present the first application of AISecKG. Using our AISecKG dataset and its Named
Entity annotations, automated systems can be developed to help identify the named entities
from public texts related to the education cybersecurity ecosystem.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Dataset Preparation:</title>
          <p>We split the AISecKG annotated dataset into train, dev, and test keeping 3, 1, and 2 documents,
respectively. The train, dev, and test splits contain 5772, 3591, and 195 entities in 372, 214, and 13
sentences, respectively. We keep an empty line as a separator for each cybersecurity sentence.
This dataset is provided as input to each model.
4.1.2. Models
We experiment with six variations of two transformer-based language models: BERT [33], and
RoBERTa [34]. For BERT, we use cased and uncased versions of the base (110M parameters) and
large (340M parameters) variations, and for RoBERTa, we use both the base (125M parameters)
and large (355M parameters) models. The BERT-base and RoBERT-base architectures have
12 layers, 12 attention heads, and 768 hidden dimensions, whereas both the BERT-large and
RoBERTa-large have 24 layers, 16 attention heads, and 1024 hidden dimensions.</p>
          <p>First, the model tokenizes the input sentence and generates embeddings of the tokens. Then
we consider the sequence labeling approach of the language models, that is, classifying each
token of a given sentence into any one of the 25 classes (12 entity types with B and I tags along
with O representing other. We aggregate the classified continuous beginning and intermediate
tokens into entities. In this approach, we not only extract the entities but also identify the type
of these entities.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.3. NER Results</title>
          <p>We train each model for 30 epochs with a maximum sequence length of 128 per GPU batch size
of 32. Table 4.1.3 shows the performance of our sequence classification on the test set. It can
be seen from the table that case-sensitive BERT performs best in all the metrics. This shows
that case sensitivity positively impacts the cybersecurity NER model. As expected, all smaller
versions of the models perform comparatively poorly compared to their larger counterparts
because of less number of parameters. Our accuracy in predicting the entities is over 80%
across all the models. Our precision, recall, and F1 scores are pretty good, given the fewer
training samples and many diverse class categories. This shows the efectiveness of our model
in identifying entities involved in our ontology from cybersecurity texts.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Triples for Knowledge Graph</title>
        <p>The second application uses the annotated dataset to create triples for the knowledge graphs.
The triples are a way to store the graph data in the form of ’entity-relation-entity,’ where the
entity represents the nodes, and the relation represents the labeled edge. The triples data can
be used to construct knowledge graphs to provide a visual representation. For this work, since
the focus is to provide learning aids to students, we build visual concept graphs from the lab
documents using these triples. The conceptual graphs help break down complex information
and allow the students to visually analyze the underlying concepts and the interconnections
between diferent concepts.</p>
        <p>The constraints and rules of creating edges were defined based on the schema definition of
AISecKG Ontology. There are 68 schema edges between the 12 entity types as per the schema
given in Section 3. We use the annotated sentences from lab documents. The relations between
labeled entities in each sentence were extracted automatically by matching with the tuples in the
ontology. We manually validated the triples and removed the redundant and ambiguous triples.
Around 812 triples were auto-generated, which were reduced to 730 triples after validation in
the final dataset. Table 5 gives a list of sample triples from the dataset for each tuple in our
ontology.</p>
        <p>Figure 5 shows one of the sub-graphs generated using the subset of triples. This graph
shows the knowledge graph on NMap tool. The visual graphs related to a specific entity can
be created by writing the graph queries. Any graph database, such as RDF or Neo4j, can be
used, and graph query languages like GraphQL, SPARQL, or CyPher can query and generate
the graphs [35]. We have used the Networkx library in Python to generate the graph. We
store the triple in a csv file to make it publicly available. The triple dataset, annotated data
and implementation code for both applications are available in our github repository1. Thus
the ontology and labeled entities in AISecKG can be used to create knowledge graphs from
any unstructured texts on cybersecurity by extracting the cybersecurity-related named entities
using the model and relations per the ontology.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Works</title>
      <p>In this work, we present a novel ontology on Cybersecurity Education, AISecKG, and show
that this ontology is vital to building self-paced AI-based learning tools for cybersecurity
learners. More research must be done in this direction as these tools can be crucial to prepare
the cybersecurity specialist workforce. Additionally, we introduce a manually annotated named
entity dataset based on ontology. We also show how our AISecKG can be used in downstream
tasks. First, we present how the language models can be trained with our annotated dataset
to extract cybersecurity-related named entities from the cybersecurity documents. There are
minimal works [36] on extracting such information from public forum cybersecurity learning
materials written by professionals for novice vulnerability researchers. We want to extend
this work beyond lab manuals to cybersecurity educational texts in public forums. Secondly,
we show the process of creating triples by automatically extracting the relations based on the
schema definition given by the ontology. We present one application as the construction of
knowledge graphs from triple data for concept visualization. Other downstream applications
like question-answering systems and learning recommendation systems can be built using the
triple dataset.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>We are thankful to National Science Foundation under Grant No. 2114789 for supporting this
research work. We would also like to acknowledge Dijiang Huang for his vision and guidance.
1https://github.com/garima0106/AISecKG-cybersecurity-dataset.git
on Big Data (Big Data), IEEE, 2020, pp. 2596–2604.
[23] Y. Deng, D. Lu, D. Huang, C.-J. Chung, F. Lin, Knowledge graph based learning guidance for
cybersecurity hands-on labs, in: Proceedings of the ACM conference on global computing
education, 2019, pp. 194–200.
[24] Y. Deng, Z. Zeng, K. Jha, D. Huang, Problem-based cybersecurity lab with knowledge
graph as guidance, Journal of Artificial Intelligence and Technology 2 (2022) 55–61.
[25] Y. Deng, Z. Zeng, D. Huang, Neocyberkg: enhancing cybersecurity laboratories with a
machine learning-enabled knowledge graph, in: Proceedings of the 26th ACM Conference
on Innovation and Technology in Computer Science Education V. 1, 2021, pp. 310–316.
[26] G. Agrawal, Y. Deng, J. Park, H. Liu, Y.-C. Chen, Building knowledge graphs from
unstructured texts: Applications and impact analyses in cybersecurity education, Information 13
(2022) 526.
[27] D. Craigen, N. Diakun-Thibault, R. Purse, Defining cybersecurity, Technology Innovation</p>
      <p>Management Review 4 (2014).
[28] M. Roesch, et al., Snort: Lightweight intrusion detection for networks., in: Lisa, volume 99,
1999, pp. 229–238.
[29] L. Ehrlinger, W. Wöß, Towards a definition of knowledge graphs., SEMANTiCS (Posters,</p>
      <p>Demos, SuCCESS) 48 (2016) 2.
[30] M. Kejriwal, Domain-specific knowledge graph construction, Springer, 2019.
[31] Y. Vasiliev, Natural language processing with Python and spaCy: A practical introduction,</p>
      <p>No Starch Press, 2020.
[32] E. F. Sang, J. Veenstra, Representing text chunks, arXiv preprint cs/9907006 (1999).
[33] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[34] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
arXiv:1907.11692 (2019).
[35] R. Angles, M. Arenas, P. Barceló, A. Hogan, J. Reutter, D. Vrgoč, Foundations of modern
query languages for graph databases, ACM Computing Surveys (CSUR) 50 (2017) 1–40.
[36] K. K. Pal, K. Kashihara, P. Banerjee, S. Mishra, R. Wang, C. Baral, Constructing flow graphs
from procedural cybersecurity texts, arXiv preprint arXiv:2105.14357 (2021).</p>
      <p>User
attacker
securityTeam
(user uses app)
(user uses system)
(user uses data)
(app uses data)
(system uses data)
(system has_a tool)
(app has_a tool)
(tool is_part_of system)
(tool is_part_of app)
(tool is_part_of tool)
(tool has_a tool)
(tool has_a function)
(tool has_a feature)
(tool uses technique)
(technique has_a tool)
(technique can_analyze system)
(technique can_analyze app)
(technique can_analyze data)
(function has_a feature)
(feature is_part_of tool)
(system has_a feature)
(app has_a feature)
(feature uses data)
(app can_expose vulnerability)
(system can_expose vulnerability)
(data can_expose vulnerability)
(feature can_expose vulnerability)
(attacker can_exploit vulnerability)
(attacker uses feature)
(attacker uses function)
(attacker uses tool)
(attacker implements attack)
(attacker uses technique)
(technique implements attack)
(attacker can_harm app)
(attacker can_harm data)
(attacker can_harm system)
(attack is_part_of attack)
(attack can_harm system)
(attack can_harm app)
(attack can_harm data)
(securityTeam can_analyze app)
(securityTeam can_analyze data)
(securityTeam can_analyze system)
(securityTeam uses tool)
(securityTeam implements function)
(securityTeam can_analyze feature)
(securityTeam uses technique)
(securityTeam can_detect vulnerability)
(securityTeam can_analyze attack)
(tool can_analyze system)
(tool can_analyze app)
(tool can_analyze data)
(tool can_analyze vulnerability)
(tool can_detect attack)
(tool can_analyze feature)
(function can_analyze vulnerability)
(function can_analyze system)
(function can_analyze app)
(function can_analyze data)
(function can_detect attack)
(feature can_analyze system)
(feature can_analyze app)
(feature can_analyze data)
(feature can_analyze vulnerability)
(feature can_detect attack)
(technique can_analyze vulnerability)
(technique can_detect attack)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nashon</surname>
          </string-name>
          ,
          <article-title>Development of an instrument designed to investigate elements of science students' metacognition, self-eficacy and learning processes: The semli</article-title>
          -s,
          <source>International Journal of Science Education</source>
          <volume>30</volume>
          (
          <year>2008</year>
          )
          <fpage>1701</fpage>
          -
          <lpage>1724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>BRIEF</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. LY</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S. E. A.</given-names>
            <surname>ION</surname>
          </string-name>
          ,
          <article-title>A framework for k-12 science education: Practices, crosscutting concepts, and core ideas (</article-title>
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Angle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Jedai: A system for skill-aligned explainable robot planning</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems</source>
          , AAMAS '22,
          <string-name>
            <surname>International</surname>
            <given-names>Foundation</given-names>
          </string-name>
          <source>for Autonomous Agents and Multiagent Systems</source>
          , Richland,
          <string-name>
            <surname>SC</surname>
          </string-name>
          ,
          <year>2022</year>
          , p.
          <fpage>1917</fpage>
          -
          <lpage>1919</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fettach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghogho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Benatallah</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs in education and employability: A survey on applications and techniques</article-title>
          , IEEE Access (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. d. Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Souag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Salinesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mazo</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Comyn-Wattiau, A security ontology for security requirements elicitation</article-title>
          ., in: ESSoS, Springer,
          <year>2015</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Syed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Padia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mathews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>Uco: A unified cybersecurity ontology</article-title>
          ,
          <source>UMBC Student Collection</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Doynikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fedorchenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kotenko</surname>
          </string-name>
          ,
          <article-title>Ontology of metrics for cyber security assessment</article-title>
          ,
          <source>in: Proceedings of the 14th International Conference on Availability, Reliability and Security</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          ,
          <article-title>Formalizing information security knowledge</article-title>
          ,
          <source>in: Proceedings of the 4th international Symposium on information, Computer</source>
          , and Communications Security,
          <year>2009</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Mozzaquatro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Agostinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jardim-Goncalves</surname>
          </string-name>
          ,
          <article-title>An ontology-based cybersecurity framework for the internet of things</article-title>
          ,
          <source>Sensors</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>3053</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Iannacone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bohn</surname>
          </string-name>
          , G. Nakamura,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gerth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hufer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bridges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ferragut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goodall</surname>
          </string-name>
          ,
          <article-title>Developing an ontology for cyber security knowledge graphs</article-title>
          ,
          <source>in: Proceedings of the 10th Annual Cyber and Information Security Research Conference</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rastogi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dutta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gittens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <surname>Malont:</surname>
          </string-name>
          <article-title>An ontology for malware threat intelligence</article-title>
          ,
          <source>in: Deployable Machine Learning for Security Defense: First International Workshop, MLHat</source>
          <year>2020</year>
          , San Diego, CA, USA,
          <year>August 24</year>
          ,
          <year>2020</year>
          , Proceedings 1, Springer,
          <year>2020</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Reyes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. I.</given-names>
            <surname>Panach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Rochwerger</surname>
          </string-name>
          ,
          <article-title>Conceptual characterization of cybersecurity ontologies</article-title>
          ,
          <source>in: The Practice of Enterprise Modeling: 13th IFIP Working Conference, PoEM</source>
          <year>2020</year>
          , Riga, Latvia,
          <source>November 25-27</source>
          ,
          <year>2020</year>
          , Proceedings 13, Springer,
          <year>2020</year>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alshaibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Ani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Azzawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelupanov</surname>
          </string-name>
          ,
          <article-title>The comparison of cybersecurity datasets</article-title>
          ,
          <source>Data</source>
          <volume>7</volume>
          (
          <year>2022</year>
          )
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>H. M. D. Trong</surname>
            , D.-T. Le,
            <given-names>A. P. B.</given-names>
          </string-name>
          <string-name>
            <surname>Veyseh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <article-title>Introducing a new dataset for event detection in cybersecurity texts</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5381</fpage>
          -
          <lpage>5390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bhusal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rastogi</surname>
          </string-name>
          ,
          <article-title>Cyner: A python library for cybersecurity named entity recognition</article-title>
          ,
          <source>arXiv preprint arXiv:2204.05754</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Liu,
          <article-title>Data and knowledge-driven named entity recognition for cyber security</article-title>
          ,
          <source>Cybersecurity</source>
          <volume>4</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tikhomirov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sirotina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dobrov</surname>
          </string-name>
          ,
          <article-title>Using bert and augmentation in named entity recognition for cybersecurity domain</article-title>
          ,
          <source>in: Natural Language Processing and Information Systems: 25th International Conference on Applications of Natural Language to Information Systems, NLDB</source>
          <year>2020</year>
          , Saarbrücken, Germany, June 24-26,
          <year>2020</year>
          , Proceedings 25, Springer,
          <year>2020</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields</article-title>
          ,
          <source>Tsinghua Science and Technology</source>
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>259</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C. V.</given-names>
            <surname>CVE</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Exposures</surname>
          </string-name>
          , Url http://cve. mitre. org, Accessed in January (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>MITRE</surname>
          </string-name>
          ,
          <article-title>National vulnerability database (nvd),”</article-title>
          , https://nvd. nist. gov/ (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piplai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kotal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>A comparative study of deep learning based named entity recognition algorithms for cybersecurity</article-title>
          , in: 2020 IEEE International Conference
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>