<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NIST data leakage case⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ľubomír Antoni</string-name>
          <email>lubomir.antoni@upjs.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavol Sokol</string-name>
          <email>pavol.sokol@upjs.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophia Petra Krišáková</string-name>
          <email>sophia.petra.krisakova@student.upjs.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominika Kotlárová</string-name>
          <email>dominika.kotlarova@student.upjs.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ondrej Krídlo</string-name>
          <email>ondrej.kridlo@upjs.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stanislav Krajči</string-name>
          <email>stanislav.krajci@upjs.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science, Faculty of Science, Pavol Jozef Šafárik University in Košice</institution>
          ,
          <addr-line>Jesenná 5, 040 01 Košice</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Security incidents involving unauthorized access and data leakage remain one of the most critical challenges in modern cybersecurity, demanding transparent and interpretable methods for behavioral analysis. This paper presents an application of Formal concept analysis and association rule mining to the NIST Data Leakage Case EVT dataset, which simulates a security incident involving potential data exfiltration. Our objective is to uncover and interpret logical dependencies among binary audit attributes and time-based features extracted from system event logs. Using Formal concept analysis, we construct a concept lattice that visualizes the hierarchical structure of attribute co-occurrences and reveals frequent as well as rare behavioral patterns. From the underlying formal context, we derive a canonical base of attribute implications with 100% confidence and extend this with approximate association rules to capture near-deterministic relationships. The results highlight both typical access configurations and anomalous combinations involving weekend and night-time activity. Our findings demonstrate that Formal concept analysis provides a transparent and systematic framework for reasoning over event data, ofering interpretable support for anomaly detection and forensic analysis in cybersecurity contexts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Formal concept analysis</kwd>
        <kwd>cybersecurity</kwd>
        <kwd>event analysis</kwd>
        <kwd>concept lattice</kwd>
        <kwd>attribute implication</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Insider threats, particularly those involving intentional data exfiltration by trusted employees, represent
a persistent and complex challenge for cybersecurity systems. Traditional signature-based detection
methods often fail to identify subtle behavioral patterns that precede such incidents. To address this gap,
high-quality, realistic datasets are crucial for developing and evaluating behavioral and anomaly-based
detection models. Behavioral modeling approaches, such as those based on access logs, user activity
profiling, and context-aware policies have been proposed as more efective strategies for identifying
insider misuse [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Based on this dataset, a specific dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] suitable for various machine learning
and data analysis methods was created. This dataset is also applicable in the area of formal concept
analysis. A more detailed description of the dataset is provided in Section 3.
      </p>
      <p>
        However, progress in this area strongly depends on the availability of comprehensive and semantically
rich datasets that simulate real-world security contexts. The NIST Data Leakage Test Case is a publicly
available dataset designed by the National Institute of Standards and Technology to support research in
the detection of insider threats and unauthorized data exfiltration. It simulates user behavior within
a fictional organization, incorporating realistic event logs, file access traces, and security-relevant
metadata that culminate in a staged data leakage scenario. The dataset was created to evaluate behavioral
detection methods, policy enforcement mechanisms, and forensic analysis tools in cybersecurity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Formal concept analysis is a set of mathematical methods grounded in lattice theory and propositional
logic, designed to identify and analyze relationships within structured data. As an unsupervised
biclustering approach, Formal concept analysis simultaneously groups both objects and their attributes,
forming formal concepts, i.e., pairs consisting of a set of objects (extent) and a set of shared attributes
(intent). These concepts are organized into a hierarchical structure called a concept lattice, which
reveals inherent patterns, dependencies, and generalization-specialization relationships within the
dataset [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
      <p>
        Formal concept analysis provides a robust framework for exploratory data analysis, knowledge
discovery, and fuzzy extensions, particularly in domains where the interpretation of attribute
cooccurrence is essential. By uncovering hidden regularities and highlighting meaningful associations,
Formal concept analysis supports informed decision-making and deepens the understanding of complex
relational structures [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ].
      </p>
      <p>The remainder of this paper is organized as follows. After the introduction, we present the related
research and papers in Section 2. Section 3 introduces the dataset and outlines the preprocessing steps
applied. In Section 4, we describe the proposed methodology, which integrates Formal concept analysis
and the extraction of attribute dependencies. Section 5 presents the experimental results along with
their interpretation and discussion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        In this paper, we focus on the use of Formal concept analysis for solving cybersecurity incidents and
conducting digital forensic analysis. In this sense, the paper complements the papers [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. In the
paper [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the authors focus on analysing meaningful groups of digital objects based on common
attributes and visualising the hierarchy of concepts. They describe the formal context derived from
digital traces collected from NTFS file system and present several concept lattices enriched with
association rules. Their research is further expanded in the paper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where the authors generate four
concept lattices for diferent subsets of attributes (timestamps, file types, etc.) and compare several
association rule mining methods, while also interpreting fuzzy implications in the context of Formal
Concept Analysis. Digital forensic analysis of cybersecurity incidents involving social engineering is
also addressed in the paper [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Cybersecurity represents a broad area of diferent topics, as reflected in the related work. Formal
Concept Analysis can serve as a handy tool for detecting and retrieving various criminal activities
committed in cyberspace. In the paper [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the authors develop an explicitly defined and conceptual
system for analysing e-fraud data in cyberspace.
      </p>
      <p>
        The use of Formal concept analysis for malware analysis is also of interest. In the paper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the
authors propose F-FCA (Feature-driven Formal concept analysis), in which each object and concept
is associated with a temporal logic formula. They also introduce the FOCA algorithm to generate the
concept hierarchy using an object-joining operator. Experiments on a real dataset of 3,000 malware
samples demonstrate the efectiveness of the proposed approach compared to traditional Formal concept
analysis. Malware analysis using Formal concept analysis is also discussed in the paper [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], where the
authors argue that creating a standard naming convention and hierarchy for malware is important to
improve collaboration and information sharing in this field.
      </p>
      <p>
        Security governance is an important part of cybersecurity, and examples of the use of FCA can also
be found in this area. In the paper [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], the authors present a systematic synthesis of the General
Data Protection Regulation (GDPR) using Formal concept analysis. Based on its principles, the GDPR
is synthesised into a concept lattice containing 144.372 records. This can be used, for example, to
identify implicit logical relations within the regulation and their intensity. These results can support
(re)design, development, operation, or refactoring of information systems towards a higher level of
GDPR compliance. Threat and threat-agent analysis is also part of cybersecurity management. In the
paper [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], Formal concept analysis was used to uncover deeper relationships in the MITRE taxonomic
framework. The results of the exploratory analysis were then encoded into an ontology using the OWL
language, allowing logical reasoning over the relationships between cyber techniques and procedures.
      </p>
      <p>
        Finally, asset protection and the implementation of security measures are also part of cybersecurity.
In this respect, the paper [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] proposes a new heuristic approach based on Formal concept analysis for
improving the security and privacy protection of sensitive e-Health information through item-hiding
techniques. The proposed FACHS method minimises side efects and distortion of the original database
while not requiring preliminary frequent itemset mining.
      </p>
      <p>
        The above shows that Formal concept analysis represents one of the possible approaches to solving
various cybersecurity problems. The research presented in this paper builds on the work reported in
the papers [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. In contrast to those papers, the present work focuses on a diferent type of forensic
artefacts. In addition, the presented research also focuses on temporal data, as also explored in the
paper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset and its preprocessing</title>
      <p>
        The NIST Data Leakage Case [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] ofers a unique resource for researchers and practitioners in this area.
Published by the National Institute of Standards and Technology (NIST) as part of its Data Exfiltration
Test Cases, the dataset simulates user behavior in a fictional organization where a data leakage incident
gradually unfolds. It includes detailed Windows event logs, file system activity, access records, and
metadata annotations. The NIST Data Leakage Case EVT dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] created from captures both normal
and malicious activity, making it suitable for supervised and unsupervised machine learning, as well as
exploratory data analysis.
      </p>
      <p>This dataset enables the study of temporal patterns, user behavior profiling, privilege escalation, and
unauthorized access attempts. Its structure supports tasks such as clustering, anomaly detection, and
association rule mining, particularly using its rich set of binary audit attributes and event metadata.</p>
      <p>To investigate behavioral patterns and detect potential anomalies in system activity, we selected a
set of binary attributes from the NIST Data Leakage Case EVT dataset. From the full event log, we
extracted nine binary features representing audit categories and log metadata (Table 1). These were
complemented by two newly engineered time-based features.</p>
      <p>
        In addition to these system-generated binary indicators, two custom binary time-based attributes
were created:
• is_night_activity: equals 1 if the event occurred between 00:00 and 05:59 UTC (nighttime
hours), and 0 otherwise;
• is_weekend: equals 1 if the event occurred on a Saturday or Sunday, and 0 otherwise.
The final dataset used for analysis consists of 10,306 rows and 11 binary columns, forming a structured
event-log matrix suitable for unsupervised learning techniques such as bi-clustering and the extraction
of attribute implications, which we describe in the following section. For a more detailed description of
the dataset creation process and attribute specification, see the paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        In this section, we introduce the basic notions of Formal concept analysis, a mathematical framework
for extracting and representing conceptual structures in data. Formal concept analysis is appropriate
for analyzing binary relations between objects and their attributes, which aligns with the structure of
our dataset. The fundamental building block of Formal concept analysis is the formal context, which
encodes the presence or absence of relationships between elements [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. We begin with the following
formal definition:
Definition 1. Let  and  be non-empty sets representing, respectively, a collection of objects and
attributes, and let  ⊆  ×  be a crisp binary relation indicating which objects are associated with which
attributes. The triple ⟨, ,  ⟩ is termed a formal context. The relation  is referred to as the incidence
relation, capturing the presence of attributes in objects.
      </p>
      <p>a
b
c
i
×
× ×
×
ii
iii</p>
      <p>We may interpret a formal context as a binary incidence table, where each entry indicates whether
a given object possesses a particular attribute. To formally capture the structure embedded in such
a context, we now introduce two dual operators that act on subsets of objects and attributes. These
operators serve as the foundation for constructing formal concepts.</p>
      <p>{a,b,c}
{a,c}
{b}</p>
      <p>∅
{a,b}
{a}
{b,c}
{c}
↗ ({a,b,c})
↗ ({a,c})
↗ ({a,b})
↗ ({a})
↗ ({b,c})
↗ ({c})
↗ ({b})
↗ (∅)
Definition 2. Let ⟨, ,  ⟩ be a formal context, and let  ∈  (),  ∈  () denote subsets of the sets
of objects and attributes, respectively. We define two mappings:
↗ :  () →  (),</p>
      <p>↗ = { ∈  | ∀ ∈ , ⟨, ⟩ ∈  },
↘ :  () →  (),</p>
      <p>↘ = { ∈  | ∀ ∈ , ⟨, ⟩ ∈  }.</p>
      <p>These mappings are referred to as the concept-forming operators of the formal context ⟨, ,  ⟩. The
operator ↗ yields the set of all attributes common to a given set of objects, while ↘ returns the set of all
objects that share a given set of attributes.</p>
      <p>The concept-forming operators introduced above provide the foundation for defining structured
clusters of data known as formal concepts.</p>
      <p>Definition 3. Let ⟨, ,  ⟩ be a formal context and let ↗, ↘ be the associated concept-forming operators.
For any subsets  ∈  () and  ∈  (), a pair ⟨,  ⟩ is called a formal concept of the context ⟨, ,  ⟩
if and only if it satisfies the conditions:
 ↗ = 
and
 ↘ = .</p>
      <p>In this case, the set  is referred to as the extent of the formal concept, representing the collection of objects,
while the set  is called the intent, representing the set of attributes shared by all objects in  .
{a,b}
↗ ({a,b})
= {i}
↘ ({i})
= {a,b}
∅</p>
      <p>{ii}
{i,ii}
{i,ii,iii}
The collection of all formal concepts derived from a given formal context ⟨, ,  ⟩ is denoted as:</p>
      <p>C(, ,  ) = {⟨,  ⟩ ∈  () ×  () |  ↗ = ,  ↘ =  }.</p>
      <p>The set of all formal concepts derived from a given context can be naturally equipped with a partial
order based on the inclusion of extents (or, equivalently, reverse inclusion of intents). Under this order,
the set of formal concepts forms a complete lattice structure, known as the concept lattice.
Definition 4. Let ⟨1, 1⟩, ⟨2, 2⟩ ∈ C(, ,  ) be two formal concepts of the context ⟨, ,  ⟩.
Define a partial order ⪯ on C(, ,  ) by:
⟨1, 1⟩ ⪯ ⟨ 2, 2⟩
if and only if 1 ⊆ 2
(equivalently, 2 ⊆ 1).</p>
      <p>The partially ordered set ⟨C(, ,  ), ⪯⟩ is called the concept lattice of the context ⟨, ,  ⟩. For
convenience, it is typically denoted by CL(, ,  ).</p>
      <p>⟨{a,b}, {i}⟩
⟨{a,b,c}, ∅⟩
⟨{b}, {i,ii}⟩
⟨∅, {i,ii,iii}⟩
⟨{b,c}, {ii}⟩</p>
      <p>
        The formal context and its associated concept lattice provide a foundation for discovering attribute
dependencies, also known as attribute implications, that hold within a given dataset [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. These
dependencies capture regularities in how attributes co-occur across the object set. As demonstrated by
Ganter and Wille [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Formal concept analysis can, for instance, be applied to the analysis of feasible
configurations of computer hardware components, where only certain combinations of features are
considered valid. This exemplifies how one may study admissible attribute combinations and the logical
relationships between them.
      </p>
      <p>Such an approach is particularly beneficial in classification tasks where a large set of objects is
characterized by a relatively small number of attributes. In these cases, it becomes advantageous
to represent domain knowledge through attribute implications, i.e., logical formulas stating that the
presence of one subset of attributes guarantees the presence of another.</p>
      <p>Formally, we define attribute implications as follows:
Definition 5. Let  be a non-empty set of attributes and let ,  ⊆ . An attribute implication over
 is a formal expression of the form  ⇒  , interpreted as: "every object possessing all attributes in 
also possesses all attributes in  ." The implication  ⇒  is said to be valid in a set  ⊆  if  ⊆ 
implies  ⊆  . Equivalently, the implication is valid in  if either  ̸⊆  or  ⊆  .</p>
      <p>We now provide a simple example to illustrate the validity of an attribute implication:
Example 1. Let  = {1, 2, 3, 4} be a set of attributes, and let  = {1, 4} ⊆ . The implication
{1, 3} ⇒ {4} is a valid attribute implication over . Moreover, it is valid in the subset , since
{1, 3} ̸⊆  and the premise of the implication does not hold in .</p>
      <p>We now extend the notion of implication validity from individual subsets of attributes to collections
of such subsets. This generalization enables us to formalize the notion of an attribute implication being
valid across a set of observations or contexts.</p>
      <p>Definition 6. Let  be a non-empty set of attributes, and let ,  ⊆ . Furthermore, let  ⊆  ()
denote a collection of subsets of attributes. An attribute implication  ⇒  is said to be valid in  if it is
valid in every set  ∈  , i.e., for all  ∈  , the condition  ⊆  ⇒  ⊆  holds.</p>
      <p>This generalized notion of validity allows us to define the validity of attribute implications within a
formal context by interpreting the context as a collection of attribute sets derived from objects.
Definition 7. Let ⟨, ,  ⟩ be a formal context, and let ,  ⊆ . An attribute implication  ⇒  is
said to be valid in the formal context ⟨, ,  ⟩ if it is valid in the collection</p>
      <p>= {{}↗ |  ∈ },
where ↗ denotes the concept-forming operator that maps an object to the set of attributes it possesses.</p>
      <p>The set  = {{}↗ |  ∈ } thus represents all attribute sets associated with individual objects in
the context. This construction provides a basis for evaluating the validity of any attribute implication
with respect to the entire formal context.</p>
      <p>
        While this approach allows for checking the validity of selected attribute implications, our objective
is often to compute a complete and non-redundant set of all valid implications that fully characterizes the
data. For this purpose, the Guigues–Duquenne basis (also known as the stem base) ofers a canonical and
minimal representation of all valid attribute implications in a formal context [21]. Eficient algorithms
for its computation are described in [
        <xref ref-type="bibr" rid="ref6">6, 22</xref>
        ], and it plays a central role in the logical analysis of data
dependencies.
      </p>
      <p>While attribute implications capture only those relationships that are logically valid in the context (i.e.,
hold in 100% of the objects), a more flexible framework is needed to describe approximate dependencies
that may hold with high but not perfect reliability. This is where the notion of confidence becomes
central [23].</p>
      <p>Definition 8. Let ⟨, , ⟩ be a formal context, and let ,  ⊆ . The confidence of an implication
 ⇒  in this context is defined as:
conf( ⇒  ) = |{ ∈  |  ⊆ ↗ and  ⊆ ↗}| .</p>
      <p>|{ ∈  |  ⊆ ↗}|
That is, confidence is the proportion of objects possessing all attributes in  that also possess all attributes
in  .</p>
      <p>An attribute implication is thus a special case of an association rule with confidence equal to 100%.
More generally, association rules allow for modeling relationships that are statistically strong but not
universally valid. For example, the rule
 ⇒</p>
      <p>with conf( ⇒  ) = 97%
expresses that in 97% of all objects where  holds, the attributes in  also hold. This framework is
particularly useful in data analysis tasks involving noise, exceptions, or incomplete patterns.</p>
      <p>In our analysis, we first computed all attribute implications with confidence 100% using the canonical
Guigues–Duquenne basis. We then extended the analysis by relaxing the confidence threshold, thereby
deriving a broader set of association rules that capture near-deterministic behavioral patterns in the
data.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and interpretation</title>
      <p>To analyze the structure of dependencies and co-occurrences among binary attributes in the dataset,
we applied Formal concept analysis described in the previous section. This technique allowed us to
extract a hierarchy of formal concepts based on shared attribute subsets, resulting in a concept lattice
consisting of 26 distinct concepts.</p>
      <p>Figure 5 shows the resulting concept lattice diagram. Each node in the diagram represents a formal
concept, characterized by an extent (the number of objects/rows) and an intent (the set of common
attributes). The extent size and its proportion with respect to the full dataset (10,306 rows) are shown
in each node, e.g., "4862 / 47%" indicates that 47% of the objects share the attributes represented by that
node.</p>
      <p>The lattice is ordered by set inclusion: higher nodes have more general attribute sets (smaller intent,
larger extent), while lower nodes represent more specific combinations (larger intent, smaller extent).
Edges between concepts indicate subset relationships among attribute sets.</p>
      <p>Several important observations emerge from the structure of the concept lattice. These insights
reflect not only the frequency and co-occurrence of attributes, but also reveal potentially anomalous or
rare behavioral patterns in the dataset:
• The most frequent attribute is keywords_event_log_classic, present in 63% of the events,
followed by is_weekend (47%) and keywords_correlation_hint (16%).
• The time-based attribute is_night_activity appears in only 1% of the events, reflecting
the fact that the dataset contains only a small fraction of activity recorded during nighttime
hours. This low support limits its general influence but makes it a useful indicator for identifying
potential anomalies occurring outside regular working hours.
• Concepts near the bottom of the lattice (with very small support, e.g., &lt; 1%) can represent
highly specific and rare patterns. These may be of interest in anomaly detection or in identifying
suspicious behavioral combinations.
• No objects were found to match the bottom-most node (extent = 0), meaning that no event shares
the full set of binary attributes simultaneously, which is expected.</p>
      <p>In summary, the concept lattice provides a hierarchical view of how event attributes co-occur,
revealing frequent combinations and rare intersections. This structure can guide both rule extraction
(via implications) and further filtering of unusual behavior for investigation, which is described in the
following paragraphs.</p>
      <p>In addition to the concept lattice analysis, we applied attribute implication mining and association
rule extraction1 on the same binary dataset consisting of 10,306 rows and 11 attributes. The goal was to
discover logical dependencies between attribute combinations, expressed in the form of implications
 ⇒  with associated support and confidence values. The resulting rules provide interpretable
insights into frequent co-occurrences, typical system behaviors, and potential temporal or structural
patterns in the event data. A selection of the most relevant rules is summarized in Table 2.</p>
      <p>The rules in Table 2 reveal strong co-occurrence patterns among audit and keyword attributes.
For instance, rules 1, 3, 5, and 6 show that whenever certain system-level activities occur (such as
audit_policy_change, audit_logon_logoff, or audit_account_management), the events are
consistently logged using the legacy Windows event log format (keywords_event_log_classic).
This highlights the dominant presence of classical logging mechanisms in administrative operations.</p>
      <p>Rule 2 suggests that nearly all policy change events (in combination with legacy logging) occur during
the weekend, which may indicate scheduled maintenance or non-standard administrative behavior.</p>
      <p>Rule 7 implies that all events recorded during the night (is_night_activity) also occurred on
weekends (is_weekend). This narrow temporal window may reflect an intentional restriction or
ifltering in the data source which we mentioned in the previous concept lattice, as well.</p>
      <p>Rule 8 shows that such weekend-night events are typically logged using the classic format, but with
slightly reduced confidence (86%), which might hint at occasional outliers or logging exceptions.
1The concept lattice, attribute implications and association rules were generated using the Concept Explorer tool, a software
environment for Formal concept analysis, available at http://conexp.sourceforge.net.</p>
      <p>Together, these rules help identify deterministic and near-deterministic patterns in event attributes,
allowing for the creation of a behavioral baseline. Deviations from these rules could serve as indicators
of anomalies or unusual system activity.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this study, we explored the application of Formal concept analysis to a security dataset, the NIST Data
Leakage Case EVT. Using a carefully selected set of binary audit attributes and engineered time-based
features, we constructed a formal context with 10,306 objects and 11 attributes. This enabled us to
apply Formal concept analysis methods for uncovering logical dependencies and co-occurrence patterns
within system event logs.</p>
      <p>We first analyzed the structure of the dataset using a concept lattice, which revealed 26 formal
concepts organized by attribute inclusion. This hierarchical representation provided insights into the
frequency and overlap of various event types, highlighting typical combinations as well as rare and
potentially anomalous ones. In particular, we identified rare conjunctions involving weekend and
night-time activity that may warrant further security inspection.</p>
      <p>Building on the concept lattice, we extracted both exact attribute implications (100% confidence) and
approximate association rules (with confidence less than 100%). These rules capture stable behavioral
patterns within the dataset and serve as a form of interpretable knowledge discovery. For instance, we
found that all system-level and account management events consistently appear with legacy event log
markers, and that night-time activity is limited and correlated with weekend occurrences.</p>
      <p>In this study, we focused on the classical framework of Formal concept analysis applied to binary
(single-valued) formal contexts, where the incidence relation strictly defines whether an object possesses
a given attribute. While this approach provides a solid foundation for structural analysis and dependency
mining, it assumes crisp object-attribute relationships. As a direction for future work, the extension
of Formal concept analysis to many-valued or fuzzy settings presents a promising research avenue.
Future work could benefit from integrating fuzzy and many-valued extensions of Formal Concept
Analysis, which enable reasoning over graded or uncertain information. Notably, theoretical foundations
developed by Bělohlávek [24], Butka et al. [25], and Medina et al. [26] ofer promising directions for
adapting our framework to more complex, real-world data scenarios.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This research was carried out within the project "Automatization of Digital Forensics and Incident
Response (ADFIR)" (project code 09-I05-03-V02-00079), funded under the The Recovery and Resilience
Plan of the Slovak Republic K9 scheme: "Efective management and support of funding for science,
research and innovation" approved by the Council of the European Union. The project, implemented
at Pavol Jozef Šafárik University in Košice in collaboration with IstroSec s.r.o. and the European
Information Society Institute, aims to develop an automated framework for the collection, normalization,
and evaluation of digital traces, while ensuring their integrity and legal admissibility, to empower
cybersecurity teams in responding to incidents and reducing the impact of cyber-attacks.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
sensitive information in e-health datasets using fca approach, IEEE Access 11 (2023) 62591–62604.
[21] J.-L. Guigues, V. Duquenne, Familles minimales d’implications informatives résultant d’un tableau
de données binaires, Mathématiques et Sciences humaines 95 (1986) 5–18.
[22] G. Stumme, Conceptual Knowledge Discovery with Frequent Concept Lattices, Technical
Report FB4-Preprint 2043, Fachbereich Informatik, Technische Universität Darmstadt, Darmstadt,
Germany, 1999. Technical Report.
[23] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large
databases, in: Proceedings of the ACM SIGMOD International Conference on Management of Data,
SIGMOD ’93, ACM, Washington, D.C., USA, 1993, pp. 207–216. doi:10.1145/170035.170072.
[24] R. Bělohlávek, Lattices of fixed points of fuzzy galois connections, Mathematical Logic Quarterly
47 (2001) 111–116.
[25] P. Butka, J. Pócs, J. Pócsová, Distributed computation of generalized one-sided concept lattices on
sparse data tables, Computing and Informatics 34 (2015) 77–98.
[26] J. Medina-Moreno, M. Ojeda-Aciego, J. Pócs, E. Ramírez-Poussa, On the dedekind-macneille
completion and formal concept analysis based on multilattices, Fuzzy Sets and Systems 303 (2016)
1–20. doi:10.1016/j.fss.2016.01.007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Eberle</surname>
          </string-name>
          , L. Holder,
          <article-title>Insider threat detection using graph-based approaches</article-title>
          , in: 2009
          <source>Cybersecurity Applications &amp; Technology Conference for Homeland Security (CATCH)</source>
          , IEEE, Washington, DC, USA,
          <year>2009</year>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>241</lpage>
          . doi:
          <volume>10</volume>
          .1109/CATCH.
          <year>2009</year>
          .
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Gates, Defining the insider threat</article-title>
          ,
          <source>in: Proceedings of the 4th Annual Workshop on Cyber Security and Information Intelligence Research</source>
          (CSIIRW),
          <source>CSIIRW '08</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <string-name>
            <surname>Oak</surname>
            <given-names>Ridge</given-names>
          </string-name>
          ,
          <string-name>
            <surname>TN</surname>
          </string-name>
          , USA,
          <year>2008</year>
          , pp.
          <volume>15</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          :3. doi:
          <volume>10</volume>
          .1145/1413140.1413158.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Al-Mhiqani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alsboui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Al-Shehari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Abdulkareem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          ,
          <article-title>Insider threat detection in cyber-physical systems: a systematic literature review</article-title>
          ,
          <source>Computers and Electrical Engineering</source>
          <volume>119</volume>
          (
          <year>2024</year>
          )
          <article-title>109489</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.compeleceng.
          <year>2024</year>
          .
          <volume>109489</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Marková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sokol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Krišáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kováčová</surname>
          </string-name>
          ,
          <article-title>Dataset of windows operating system forensics artefacts</article-title>
          ,
          <source>Data in Brief</source>
          <volume>55</volume>
          (
          <year>2024</year>
          )
          <fpage>110693</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] National institute of standards and technology, data leakage test case, https://cfreds-archive.nist. gov/data_leakage_case/data-leakage-case</article-title>
          .html,
          <year>2025</year>
          .
          <article-title>CFReDS forensic reference dataset</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          ,
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          , Springer, Berlin, Heidelberg,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Carpineto</surname>
          </string-name>
          , G. Romano,
          <source>Concept Data Analysis: Theory and Applications</source>
          , John Wiley &amp; Sons,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Cornejo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Medina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Ocaña</surname>
          </string-name>
          ,
          <article-title>Attribute implications in multi-adjoint concept lattices with hedges</article-title>
          ,
          <source>Fuzzy Sets and Systems</source>
          <volume>479</volume>
          (
          <year>2024</year>
          )
          <article-title>108854</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.fss.
          <year>2023</year>
          .
          <volume>108854</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cordero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Enciso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ángel</given-names>
            <surname>Mora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pérez-Gámez</surname>
          </string-name>
          ,
          <article-title>Attribute implications with unknown information based on weak heyting algebras</article-title>
          ,
          <source>Fuzzy Sets and Systems</source>
          <volume>490</volume>
          (
          <year>2024</year>
          )
          <article-title>109026</article-title>
          . doi:
          <volume>10</volume>
          . 1016/j.fss.
          <year>2024</year>
          .
          <volume>109026</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bělohlávek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trnečka</surname>
          </string-name>
          ,
          <article-title>Semantic explorations in factorizing boolean data via formal concepts</article-title>
          ,
          <source>International Journal of Approximate Reasoning</source>
          <volume>173</volume>
          (
          <year>2024</year>
          )
          <article-title>109247</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.ijar.
          <year>2024</year>
          .
          <volume>109247</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ojeda-Hernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>López-Rodríguez</surname>
          </string-name>
          ,
          <article-title>Ángel Mora, A formal concept analysis approach to hierarchical description of malware threats</article-title>
          ,
          <source>Forensic Science International: Digital Investigation</source>
          <volume>50</volume>
          (
          <year>2024</year>
          )
          <article-title>301797</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.fsidi.
          <year>2024</year>
          .
          <volume>301797</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sokol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Krídlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kováčová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Krajči</surname>
          </string-name>
          ,
          <article-title>The analysis of digital evidence by formal concept analysis</article-title>
          ,
          <source>in: Proceedings of the International Conference on Concept Lattices and Their Applications (CLA)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sokol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Krídlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kováčová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Krajči</surname>
          </string-name>
          ,
          <article-title>Formal concept analysis approach to understand digital evidence relationships</article-title>
          ,
          <source>International Journal of Approximate Reasoning</source>
          <volume>159</volume>
          (
          <year>2023</year>
          )
          <fpage>108940</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>I. B.</given-names>
            <surname>Senkyire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.-A.</given-names>
            <surname>Kester</surname>
          </string-name>
          ,
          <article-title>Social engineering cybercrime evidence analysis using formal concept analysis</article-title>
          ,
          <source>in: 2021 International Conference on Cyber Security and Internet of Things (ICSIoT)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Waziri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Umar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olalere</surname>
          </string-name>
          ,
          <article-title>E-fraud forensics investigation techniques with formal concept analysis</article-title>
          ,
          <source>International Journal of Cyber-Security and Digital Forensics</source>
          <volume>3</volume>
          (
          <year>2014</year>
          )
          <fpage>235</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N. T.</given-names>
            <surname>Binh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Doi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. T.</given-names>
            <surname>Tho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Hai</surname>
          </string-name>
          ,
          <article-title>Feature-driven formal concept analysis for malware hierarchy construction</article-title>
          , in: International Workshop on Multi-disciplinary
          <source>Trends in Artificial Intelligence</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>385</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ojeda-Hernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>López-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mora</surname>
          </string-name>
          ,
          <article-title>A formal concept analysis approach to hierarchical description of malware threats</article-title>
          ,
          <source>Forensic Science International: Digital Investigation</source>
          <volume>50</volume>
          (
          <year>2024</year>
          )
          <fpage>301797</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Tamburri</surname>
          </string-name>
          ,
          <article-title>Design principles for the general data protection regulation (gdpr): A formal concept analysis and its evaluation</article-title>
          ,
          <source>Information Systems</source>
          <volume>91</volume>
          (
          <year>2020</year>
          )
          <fpage>101469</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Maluleke</surname>
          </string-name>
          ,
          <article-title>A formal concept analysis driven ontology for ics cyberthreats</article-title>
          ,
          <source>in: Proceedings of the South African Conference for Artificial Intelligence Research (SACAIR)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>247</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Brahmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Alaerjan</surname>
          </string-name>
          , L. Mhamdi,
          <article-title>Enhancing security and privacy preservation of</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>