<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>O. Nikiforova);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Validation of Insider Threat Identification Algorithm with Synthetic Data ⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oksana Nikiforova</string-name>
          <email>oksana.nikiforova@rtu.lv</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitaly Zabiniako</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>“ABC software” Ltd.</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Latvia</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Riga Technical University</institution>
          ,
          <addr-line>Riga</addr-line>
          ,
          <country country="LV">Latvia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper addresses the challenge of detecting insider threats in cybersecurity by proposing behavior model-driven approaches. It argues that existing datasets are incapable to capture nuanced user activities accurately and proposes an enhanced dataset generated by more elegant structure. The paper discusses the evolving threat situations and the need for proactive cybersecurity measures, presents a taxonomy of insiders, and emphasizes the importance of behavior-driven approaches. It mentions existing datasets limitations and introduces the proposed data generator structure, explaining its components and implementation logic. The paper illustrates a use case showcasing the application of generated data for insider threat identification. It concludes by stressing the significance of behavior-driven approaches and highquality datasets in enhancing detection capabilities against insider threats.</p>
      </abstract>
      <kwd-group>
        <kwd>Insider threat identification</kwd>
        <kwd>machine learning</kwd>
        <kwd>cyber security</kwd>
        <kwd>synthetic dataset generation 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Insider threat identification stands as a pressing concern within cybersecurity landscapes.
Unlike external threats, insiders possess legitimate access to organizational systems,
making their detection inherently challenging. The potential consequences of insider
threats, including data breaches, intellectual property theft, and sabotage, underscore the
criticality of effective identification mechanisms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Addressing this issue requires a refined
understanding of insider behaviors necessitating advanced detection methodologies. Thus,
the recognition of insider threats as a relevant issue reflects the evolving threat landscape
and underscores the imperative for proactive cybersecurity measures [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The state of the art of insider threat detection is characterized by a rapidly growing array
of methodologies and techniques under continual development. Researchers and
cybersecurity professionals are actively exploring diverse approaches to enhance the
efficacy and robustness of detection systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These approaches encompass a
spectrum of methodologies, ranging from rule-based systems to advanced machine learning
algorithms. Each approach brings its own strengths and limitations, reflecting the complex
nature of insider threat detection. This diversity in approaches reflects a collective effort to
address the evolving challenges posed by insider threats, with ongoing researches aimed at
pushing the boundaries of detection capabilities.
      </p>
      <p>
        Central to the advancement of insider threat detection algorithms is the availability of
high-quality datasets for accurate testing and evaluation. However, acquiring and managing
datasets that accurately represent the complexities of insider threat behaviors poses a
significant challenge. Traditional datasets often lack the diversity and granularity necessary
to effectively train and validate detection systems. Moreover, the sensitive nature of insider
threat data complicates the sharing and accessibility of suitable datasets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Addressing
this challenge requires innovative approaches to dataset development, including synthetic
data generation techniques. Overcoming this obstacle is crucial for advancing the
state-ofthe-art in insider threat detection algorithms.
      </p>
      <p>To address the problem of obtaining suitable datasets for testing insider threat detection
algorithms, our paper presents according solution: an algorithmic data genera-tor tailored
specifically for this purpose. This data generator allows to precisely define and simulate
realistic insider threat scenarios, producing datasets that encapsulate a wide range of
behavioral patterns. The generated datasets serve as valuable resources for training,
testing, and benchmarking insider threat detection algorithms.</p>
      <p>
        The paper is organized as follows. The next section gives an overview of insider threats
identification researches and data sets used for its validation, as well as an insight into some
examples of datasets suitable for validation of insider threat identification algorithms. The
third section describes the authors offered algorithm for such data generation, where the
typical scenarios are defined for usage of common information systems and the predefined
probabilities are set to simulate regular work of information systems users. The fourth
section demonstrates a use case of the generated data application for validation of the
insiders’ threats identification approach offered by authors and published in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
The fifth section gives main conclusions of the research and states real-world implications
and future directions of the research.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        The recent years have shown an increase in inside threats, like those involving Edward
Snowden, Chelsea Manning and Kim [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. A cybersecurity report from 2018 told us that
more than half of the threats they studied came from within organizations; for 27% it was
a common occurrence to deal with insider risks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Another report later on showed that
there had been a noticeable rise in what people believed were attacks from insiders: 63%
felt there was more activity going on. There is worry about the increasing difficulty of
recognizing insider attacks compared to outside dangers, because those who are inside
have been given permission and their actions can be complex [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Attacks from insiders,
which frequently happen within normal working hours, create difficulties in examining
large activity logs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        An insider typically refers to an individual possessing authorized access to an
organization's computer systems and networks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Worries increase about the challenge
of finding insider attacks compared to external dangers. Insiders, who might not have
highlevel technical abilities, mix the difference between harmful and authorized actions. We can
separate insiders into three main categories: traitors, masqueraders, and unintentional
offenders. Those who are traitorous pursue their own gain or money, masqueraders
pretend to be lawful but do illegal work and unintentional perpetrators breach security by
mistake. Usually, malicious insiders commit IT sabotage, steal intellectual property or
conduct fraud for financial reasons [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Validation of insider threat’s identification algorithms requires datasets that capture
user activities within organizational networks, where users perform different activities in
particular information systems under specific scenarios. Datasets play a crucial role in the
study of insider threat detection. However, there is still an absence of a wide-ranging
dataset from real-life situations that is available to the general public. Many existing
datasets depend on attacks that are made up artificially. In a recent survey by [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the
datasets are classified as masquerader-based, traitor-based, substituted masqueraders and
identification/authentication-based types along with other malicious ones. The datasets
investigated by authors are as follows: Amazon Employee Access Challenge (AEAC), CERT,
DARPA Insider Threat Evaluation (DITREC), Enron, Greenberg’s dataset, Los Alamos
National Laboratory (LANL), RUU, Schonlau, TWOS, University of New Brunswick datasets.
      </p>
      <p>When utilizing such datasets, researchers must prioritize conformance to any data usage
agreements and ethical guidelines. Furthermore, it is crucial to assess the diversity and
realism of scenarios depicted within the dataset to accurately measure both the
effectiveness and efficiency of insider threat identification algorithms.</p>
      <p>Based on the analysis of insiders’ threats identification algorithms, authors propose to
categorize the algorithms into the taxonomy shown in Figure 1.</p>
      <p>The taxonomy the algorithms into four parts behavior-oriented, statistical model-driven,
rule-based and algorithms working with text.</p>
      <p>The set of user behavioral analysis based insider detection algorithms, among other,
includes the following techniques:
•
•
•
•
•
•
•</p>
      <p>
        Markov Chains / Hidden Markov Model (HMM) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ];
Clustering Algorithms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ];
Graph-Based Approaches [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ];
Self-Organizing Maps (SOM) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ];
Gaussian Mixture Models (GMM) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ];
Nearest Neighbor Methods [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ];
      </p>
      <p>
        One-Class Support Vector Machines (SVM) [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>Hybrid methods combine multiple base models to improve the overall detection
performance. By leveraging the diversity of multiple models, hybrid methods enhance the
robustness and reliability of insider threat detection systems.</p>
      <p>The focus of the paper is on the behavior model-driven approaches, where user behavior
scenarios in the form of user activities graphs are applied to identify insiders’ threats. All
the algorithms for insiders’ threats identification, which use behavior model as a core of the
approach, requires data about users’ activities in the information systems with the greater
detail than the data offered in the data set described above. Audit logs that capture
businesslevel actions (e.g., logins, document accesses, data edits, information searches, etc.) are of
especial importance. Such logs provide the granular insights necessary for accurate training
and testing of algorithms de-signed to detect insider threats. By examining such nuanced
data and behavioral patterns, algorithms can better identify anomalies indicative of
malicious intent or unauthorized activities.</p>
      <p>For identification of insiders’ threats at the business level a detailed analysis of users’
activities is required. The data sources mentioned above has lesser efficiency regarding this
aspect, so the improved data set is offered further, based on the proposed Data Generator
implementation logic.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data Generator General Structure</title>
      <p>The data generator allows to generate simulated user activity data for a specified number
of users within a specified time period with certain probability settings. The input data
include the following information:
UserName - User identifier. In the generated data, this is a randomly chosen female name
(from an international list of female names) and a surname (randomly chosen animal
name).</p>
      <p>SessionId - Session identifier. In the generated data, a session is a set of ac-tions performed,
starting with the first action performed by a user in the in-formation system (usually the
“login” action) and ending with the exiting from the information system (usually the
“logout” action).
ActionTime - Time of action execution.</p>
      <p>ActionName – Name (identifier) of the action.</p>
      <p>The generated data simulates actions from e-mail system (like, open Outlook, view
email list, open e-mail, send e-mail, etc.), Web browser (like, open, view, open gmail, open
google drive, download file, etc.), and some information system (like, open menu, search
data, edit data, etc). All scenarios start with log-in action, some actions with low probability
and some malicious actions, which are called as maliciousAction1, 2, 3 and are performed
once for a specific user.</p>
      <p>The foundation of the data generator lies in a developed script, enabling the artificial
generation of a dataset mimicking user actions in information systems and recording it in a
form of an audit log. The generator allows for the emulation of user actions within a
specified time period with defined probabilities and according to predefined behavioral
models (including options for simulating malicious activities). The data generation cycle
itself begins, applying random value generation within the specified intervals for performed
actions. Within this cycle, the fulfillment of other necessary conditions is also observed, such
as: if the execution time of user actions approaches the end of the workday, the user
sequentially ends work in all active sessions by performing the “logoff” action; and if work
is completed on a Friday, the next set of actions will only commence execution on Monday
morning.</p>
      <p>After data generation, the algorithm performs additional session filtering to remove
extremely short and extremely long sessions with action counts less and more than the
specified number of actions should be performed during one session. This ensures
additional quality of the generated data from the perspective of data volume. Then the
algorithm arranges the generated data records in ascending order of action execution time
and saves the final results.</p>
      <p>In addition to the generated data from the historical period, real-time data streaming
functionality has been implemented. The utilization of real-time data generation enables
the immediate detection of security incidents by insider threat identification algorithms and
real-time reporting of security incidents to the security administrator.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Use Case of Generated Data Application</title>
      <p>
        The developed data generator has been applied to validate a custom algorithm created by
the authors, which enables the identification of malicious user actions based on the audit
logs of user-executed actions within the IS, thereby constructing the user behavior model in
the form of a graph for specific IS usage. The user behavior model allows to make
conclusions regarding anomalies in a specific user’s behavior (or deviations in it), identified
as malicious activities, and immediately alerts the data security administrator about
malicious incidents when potential breaches occur [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>In this paper, the authors demonstrate the validation of a list of malicious actions
identified by their algorithm using a dataset of user activities generated by the data
generator described above. This establishes mutual validation possibility, since we a priori
know which data are malicious in the generated dataset and which sessions should be
identified as malicious. Consequently, the algorithm should produce these sessions as a
result. The ability to utilize the generated data in the insider threat identification algorithm
confirms that the data generator has produced data suitable for validating such an
algorithm.</p>
      <p>The validation was conducted using generated activity data for 20 users, simulated over
a period of 30 days. In these three scenarios, the actions saveMail (email), downloadFile
(web browser), and printDocument (IS) are executed with significantly lower probabilities
compared to other actions in these scenarios. Real world data contains actions that are
rarely performed, potentially representing malicious activities.</p>
      <p>
        Furthermore, to simulate the execution of malicious actions, sessions for the user
“Miranda Hyena” are generated, incorporating entirely malicious actions with specific
names such as “maliciousAction1”, “maliciousAction2”, and “maliciousAction3”. As a result,
the trust coefficient for the list of users should be less than 100 only for the user Miranda
Hyena, as shown in Figure 2, where the top five users are shown. The figure illustrates that
the user Miranda Hyena occupies the highest position, with the only user with the numbers
of malicious sessions identified according to various criteria. Columns 1 through 6 indicate
the number of sessions recognized as malicious according to different applied criteria (the
algorithm that calculates malicious sessions based on extreme difference in a subgrouping
provided a more precise result with a count of 4 malicious sessions, indicating that the
application of generated data for algorithm validation demonstrated instances of more
accurate computation [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]). The presence of malicious sessions for a particular user
corresponds to the trust percentage calculated by considering the number of unique actions
in a session and the number of unique malicious actions in a session. Thus, the trust
percentage can replace all six coefficients, allowing the security administrator in the real
system to operate based solely on the total number of malicious sessions for each user.
      </p>
      <p>The malicious sessions, for which trust percentage is less than 100% (indicating that the
session contains at least one malicious action), are selected into a separate list, as shown in
Figure 3. In the synthetic data, there are 4 sessions containing malicious actions (sessions
with identifiers 11568, 11573, 11403 and 11579). It can be observed that the intentionally
generated sessions with malicious actions are identified in the resulting list and even
appear at the top of the list (see Figure 3). Figure 4 presents a fragment of detailed actions
data for a particular session 11568 with malicious actions. The full session in the form of
graph is shown in Figure 5.</p>
      <p>To summarize, the results of identifying malicious users align with expectations - the
user who had sessions with malicious actions embedded in the synthetic data has the
highest maliciousness coefficient. The results of identifying malicious sessions align with
expectations as well - sessions with the highest maliciousness were identified, containing
artificially inserted malicious actions. The visualization results of user session graphs also
align with expectations – transition to / from artificially introduced atypical malicious
actions are visually highlighted with color.</p>
      <p>More accurate results in identifying malicious actions are ensured by behavior analysis
in subgroups compared to individual and group behavior analysis. It is recommended not
to directly apply malicious actions identification to individual behavior models in groups
with 1-2 users, but rather to analyze subgroup behavior models in groups with three or
more users. More accurate results in identifying malicious sessions are provided by
parameters based on searching for actions with extremely low probabilities compared to
calculating the average probability of sessions. Overall, the trust percentage of sessions,
calculated by considering the number of unique actions in a session and the number of
unique malicious actions in a session, can replace all six coefficients.</p>
      <p>As for the evaluation of the generated data, it can be seen that sessions with data allowing
for the identification of malicious actions were created. The synthetic dataset can serve as
a basis for validating insider threat identification algorithms.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>To sum up, this research addresses a major gap in the realm of insider threat identification
algorithms by introducing an appropriate solution tailored explicitly for behavior-based
approaches. While various existing datasets exist for testing such algorithms, none provide
the level of granularity necessary to thoroughly evaluate business logic, particularly
concerning behavior-based algorithms. Our proposed solution bridges this gap, offering
researchers and practitioners an easy-configurable platform to test and refine detection
strategies effectively.</p>
      <p>One key feature of our solution is the flexibility it provides in experimenting with action
probabilities within the data generator configuration. This enables the creation of diverse
user behavior scenarios, allowing for thorough testing of algorithm robust-ness under
various circumstances. By simulating different types of user behavior, researchers can gain
valuable insights into the effectiveness of their algorithms across a spectrum of potential
threat scenarios.</p>
      <p>Moreover, our synthetic data generation capability offers the opportunity to enhance
real historical IS audit data. By seamlessly integrating our generated data with existing
datasets, researchers can augment the pool of users’ behavioral models and activities,
thereby enriching the dataset for more comprehensive analysis. This integration enables
the exploration of new user behaviors and the evaluation of algorithm performance in
scenarios not usually encountered in real-world data.</p>
      <p>Overall, our proposed solution empowers researchers and practitioners to advance their
understanding of security risks and enhance their defense mechanisms. By providing a
versatile platform for generating suitable datasets for algorithm testing and refinement, our
solution contributes to the ongoing efforts to combat insider threats and safeguard sensitive
information in today’s dynamic organizational environments.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The research leading to these results has received funding from the research project
“Competence Centre of Information and Communication Technologies” of EU Structural
funds, contract No. 5.1.1.2.i.0/1/22/A/CFLA/008 signed between IT Competence Centre
and Central Finance and Contracting Agency. The research title is “Development of a method
for analysis and automatic grouping of information system users with similar behavior,
using an AI/ML approach”. The project is co-financed by the Recovery Fund of the Action
Program “Latvian Recovery and Resilience Mechanism Plan 5.1.r. 5.1.1.r. of the reform and
investment direction “Increasing productivity through increasing the amount of investment
in R&amp;D” reforms “Management of innovations and motivation of private R&amp;D investments”
5.1.1.2.i. investment “Support instrument for the development of innovation clusters”
implementation rules within the competence centers” framework.</p>
      <p>The intellectual property “System and Method for Detecting Atypical Behavior of Users
in an Information System by Analyzing their Actions Using a Markov Chain and an Artificial
Neural Network” is submitted to World Intellectual Property Organization on 2021/02/26.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.N.</given-names>
            <surname>Al-Mhiqani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abidin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          , et al.,
          <article-title>A Review of Insider Threat Detection: Classification, Machine Learning Techniques</article-title>
          , Datasets, Open Challenges, and Recommendations,
          <source>Applied Sciences</source>
          <volume>10</volume>
          (
          <issue>15</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          (
          <year>2020</year>
          ).
          <source>DOI: 10.3390/app10155208</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[2] Cost of Insider Threats Global Report</source>
          ,
          <year>2022</year>
          . URL: https://protectera.com.au/wpcontent/uploads/2022/03/The-Cost-
          <article-title>of-</article-title>
          <string-name>
            <surname>Insider-Threats-</surname>
          </string-name>
          2022
          <source>-Global-Report.pdf</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.R.</given-names>
            <surname>Alzaabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          ,
          <article-title>A Review of Recent Advances, Challenges, and Opportunities in Malicious Insider Threat Detection Using Machine Learning Methods</article-title>
          ,
          <source>EEE Access PP(99)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          (
          <year>2024</year>
          ). DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2024</year>
          .3369906
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.B.</given-names>
            <surname>Sarhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Altwaijry</surname>
          </string-name>
          ,
          <source>Insider Threat Detection Using Machine Learning Approach. Applied Sciences</source>
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <volume>259</volume>
          (
          <year>2022</year>
          ).
          <source>DOI: 10.3390/app13010259</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.L.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.M.</given-names>
            <surname>Divakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.S.</given-names>
            <surname>Liau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thing</surname>
          </string-name>
          , Insider Threat Detection and its Future Directions,
          <source>International Journal of Security and Networks</source>
          <volume>12</volume>
          (
          <issue>3</issue>
          ), (
          <year>2016</year>
          ). DOI:
          <volume>10</volume>
          .1504/IJSN.
          <year>2017</year>
          .10005217
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lindauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Glasser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wallnau</surname>
          </string-name>
          ,
          <article-title>Generating test data for insider threat detectors</article-title>
          ,
          <source>Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <fpage>80</fpage>
          -
          <lpage>94</lpage>
          (
          <year>2014</year>
          ). DOI:
          <volume>10</volume>
          .22667/JOWUA.
          <year>2014</year>
          .
          <volume>06</volume>
          .31.080
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Nikiforova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zabiniako</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kornienko</surname>
          </string-name>
          ,
          <string-name>
            <surname>E-Step</surname>
            <given-names>Control</given-names>
          </string-name>
          :
          <article-title>Solution for Processing and Analysis of IS Users Activities in the Context of Insider Threat Identification Based on Markov Chain</article-title>
          ,
          <source>Intelligent Systems and Applications</source>
          , pp.
          <fpage>345</fpage>
          -
          <lpage>359</lpage>
          .
          <source>Lecture Notes in Networks and Systems</source>
          (
          <year>2024</year>
          ). DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -47721-8_
          <fpage>23</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Garkalns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Nikiforova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zabiniako</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kornienko</surname>
          </string-name>
          ,
          <article-title>Analysis of the Behavior of Company Employees as Users of Information Systems or Tools, Based on Employees Clustering with K-means</article-title>
          <string-name>
            <surname>Algorithm</surname>
          </string-name>
          ,
          <source>IEEE 64th International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . (
          <year>2023</year>
          ).
          <source>DOI: 10.1109/ITMS59786</source>
          .
          <year>2023</year>
          .10317652
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Nikiforova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Romanovs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zabiniako</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kornienko</surname>
          </string-name>
          ,
          <source>Detecting and Identifying Insider Threats Based on Advanced Clustering Methods, IEEE Access (12)</source>
          ,
          <fpage>30242</fpage>
          -
          <lpage>30253</lpage>
          (
          <year>2024</year>
          ). DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2024</year>
          .3365424
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.N.</given-names>
            <surname>Al-Mhiqani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abidin</surname>
          </string-name>
          , et al.:
          <article-title>A new intelligent multilayer framework for insider threat detection</article-title>
          .
          <source>Computers &amp; Electrical Engineering (97)</source>
          (
          <year>2022</year>
          ). DOI:
          <volume>10</volume>
          .1016/j.compeleceng.
          <year>2021</year>
          .
          <volume>107597</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Insider</given-names>
            <surname>Threat Report</surname>
          </string-name>
          , https://crowdresearchpartners.com/wpcontent/uploads/2017/07/Insider-Threat
          <source>-Report-2018</source>
          .pdf,
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Insider</given-names>
            <surname>Threat Report</surname>
          </string-name>
          , https://www.cybersecurity-insiders.com/wpcontent/uploads/2019/11/2020-
          <article-title>Insider-</article-title>
          <string-name>
            <surname>Threat-</surname>
          </string-name>
          Report-Gurucul.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hayes</surname>
          </string-name>
          , E. Bertino,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ojo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Choo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          ,
          <source>Impact and Key Challenges of Insider Threats on Organizations and Critical Businesses. Electronics</source>
          <volume>9</volume>
          (
          <issue>9</issue>
          ),
          <volume>1460</volume>
          (
          <year>2020</year>
          ).
          <source>DOI: 10.3390/electronics9091460</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Legg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldsmith</surname>
          </string-name>
          , S. Creese,
          <source>Automated Insider Threat Detection System Using User and Role-Based Profile Assessment. IEEE Systems Journal</source>
          <volume>99</volume>
          (
          <issue>2</issue>
          ).
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2015</year>
          ). DOI:
          <volume>10</volume>
          .1109/JSYST.
          <year>2015</year>
          .2438442
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.P.</given-names>
            <surname>Pfleeger</surname>
          </string-name>
          ,
          <article-title>Reflections on the Insider Threat</article-title>
          . In: Insider Attack and
          <string-name>
            <given-names>Cyber</given-names>
            <surname>Security</surname>
          </string-name>
          .
          <source>Advances in Information Security</source>
          (
          <volume>39</volume>
          ) Springer, Boston, MA (
          <year>2008</year>
          ). DOI:
          <volume>10</volume>
          .1007/978-0-
          <fpage>387</fpage>
          -77322-
          <issue>3</issue>
          _
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. De Vel</surname>
            , Q. Han,
            <given-names>J</given-names>
          </string-name>
          . Zhang, Y. Xiang,
          <article-title>Detecting and Preventing Cyber Insider Threats: A Survey</article-title>
          .
          <source>IEEE Communications Surveys &amp; Tutorials</source>
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1397</fpage>
          -
          <lpage>1417</lpage>
          (
          <year>2018</year>
          ). DOI:
          <volume>10</volume>
          .1109/COMST.
          <year>2018</year>
          .
          <volume>2800740</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Homoliak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toffalini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guarnizo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Elovici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ochoa</surname>
          </string-name>
          ,
          <article-title>Insight Into Insiders and IT: A Survey of Insider Threat Taxonomies, Analysis, Modeling, and Countermeasures</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>52</volume>
          (
          <issue>2</issue>
          ) (
          <year>2020</year>
          ).
          <source>DOI: 10.1145/3303771</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dymarski</surname>
          </string-name>
          , Hidden Markov Models,
          <source>Theory and Applications DOI: 10.5772/601</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          , Data Clustering: Algorithms and Applications (
          <year>2014</year>
          ) DOI:
          <fpage>10</fpage>
          .1201/9781315373515
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Foggia</surname>
          </string-name>
          , G. Percannella,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sansone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vento</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Graph-Based Clustering</surname>
          </string-name>
          Method and
          <string-name>
            <given-names>Its</given-names>
            <surname>Applications</surname>
          </string-name>
          .
          <source>Advances in Brain, Vision, and Artificial Intelligence</source>
          .
          <fpage>277</fpage>
          -
          <lpage>287</lpage>
          (
          <year>2007</year>
          ). DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -75555-5_
          <fpage>26</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>U.</given-names>
            <surname>Asan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ercan</surname>
          </string-name>
          , An Introduction to
          <string-name>
            <surname>Self-Organizing</surname>
            <given-names>Maps</given-names>
          </string-name>
          ,
          <source>Computational Intelligence Systems in Industrial Engineering: with Recent Theory and Applications</source>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>319</lpage>
          (
          <year>2012</year>
          ). DOI:
          <volume>10</volume>
          .2991/
          <fpage>978</fpage>
          -94-91216-77-0_
          <fpage>14</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nock</surname>
          </string-name>
          , Hierarchical Gaussian Mixture Model,
          <source>Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          , ICASSP, pp.
          <fpage>14</fpage>
          -
          <lpage>19</lpage>
          . (
          <year>2010</year>
          ). DOI:
          <volume>10</volume>
          .1109/ICASSP.
          <year>2010</year>
          .5495750
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          , Survey of Nearest Neighbor Techniques,
          <source>International Journal of Computer Science and Information Security</source>
          .
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>302</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2010</year>
          ).
          <source>DOI: 10.48550/arXiv.1007.0085</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Amer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abdennadher</surname>
          </string-name>
          ,
          <article-title>Enhancing one-class Support Vector Machines for unsupervised anomaly detection</article-title>
          ,
          <source>Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description</source>
          ,
          <string-name>
            <surname>ODD</surname>
          </string-name>
          <year>2013</year>
          . pp.
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          . (
          <year>2013</year>
          ).
          <source>DOI: 10.1145/2500853</source>
          .2500857.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>O.</given-names>
            <surname>Nikiforova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zabiniako</surname>
          </string-name>
          ,
          <article-title>Beyond Information System User Behavior Models: The Power of User Groups in Preventing Insider Attacks</article-title>
          .
          <source>Intelligent Systems and Applications. Lecture Notes in Networks and Systems</source>
          (
          <year>2025</year>
          )
          <article-title>accepted for publication</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>