<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontological-Relational Data Store Model for a Cloud-based SIEM System Development</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktoriya Sydorenko</string-name>
          <email>v.sydorenko@ukr.net</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oksana Zhyharevych</string-name>
          <email>zhyharevych.oksana@vnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rat Berdybaev</string-name>
          <email>r.berdybaev@aues.kz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Polozhentsev</string-name>
          <email>artem.polozhencev@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andriy Fesenko</string-name>
          <email>aafesenko88@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Almaty University of Energy and Communications</institution>
          ,
          <addr-line>126/1 Baytursynuli Str, Almaty, 050013</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lesya Ukrainka Volyn National University</institution>
          ,
          <addr-line>13 Volya Ave., Lutsk, 43025</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Aviation University</institution>
          ,
          <addr-line>1 Liubomyr Huzar Ave, Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>343</fpage>
      <lpage>354</lpage>
      <abstract>
        <p>Security Information and Event Management (SIEM) systems are widely used to prevent information loss in computer systems and networks. Currently, there are different approaches to building databases (data stores) for SIEM systems. Analysis has not revealed a universal type of database, and each has its advantages and disadvantages. This paper presents the rationale for selecting the most effective databases, based on which the model of an ontological-relational data store is implemented. The proposed model uses two different types of databases with appropriate characteristics, to improve the convenience of data storage and classification, to ensure high speed of obtaining large amounts of information through preliminary indexing, as well as load balancing and data replication. These results will be useful both for critical information infrastructure protection and for building various cyber threat monitoring systems.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Database</kwd>
        <kwd>database management system</kwd>
        <kwd>SIEM</kwd>
        <kwd>ontological-relational data store</kwd>
        <kwd>SQL</kwd>
        <kwd>NoSQL</kwd>
        <kwd>NewSQL</kwd>
        <kwd>load balancing</kwd>
        <kwd>data replication</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>As the number of cyber threats continues to
grow, one of the most effective ways to detect
them and protect information is to deploy
SIEM systems. The use of SIEM in computer
security incident response centers (CSIRTs) is
key to ensuring effective detection, analysis,
and response to security incidents. Here are
some of the main ways to use SIEM in a CSIRT:
▪ Centralized log collection: an SIEM allows
you to centrally collect, store, and manage logs
from various sources across an organization's
network, including servers, network
equipment, applications, and other security
systems. This gives CSIRTs the ability to
quickly access important data for incident
analysis.</p>
      <p>▪ Event correlation: SIEM can
automatically correlate collected logs and
identify potential security incidents using
various rules, attack signatures, and
behavioral analysis algorithms. This helps the
CSIRT team detect sophisticated attacks that
may go undetected without such analysis.</p>
      <p>▪ Real-time monitoring: SIEM provides
real-time security monitoring, allowing CSIRT
teams to respond to incidents quickly. Using
interactive tools to visualize and analyze data
can help identify unusual or suspicious
activity.</p>
      <p>▪ Decision support: The analytics and
reporting tools that come with an SIEM allow
CSIRT teams to analyze security trends and
identify potential vulnerabilities or security
gaps. This helps to better plan security
strategies and make informed decisions to
strengthen protection.</p>
      <p>▪ Documentation and regulatory compliance:
SIEM helps in automating the collection and
storage of logs for documentation and
reporting as per regulatory and compliance
requirements. This is important for meeting
legal requirements and can serve as evidence
in incident investigations.</p>
      <p>▪ Professional development and training:
With the use of SIEM, CSIRT teams can conduct
training and simulations of real incidents,
analyzing the collected data and practicing
response procedures. This helps increase the
team's preparedness for real-world threats
and improves their ability to respond quickly
and effectively to incidents.</p>
      <p>▪ Automation of response: Some SIEM
systems provide the ability to automate
routine incident response tasks, such as
isolating an infected device, blocking IP
addresses, or sending notifications to the
appropriate individuals. This reduces response
time and frees up resources to focus on more
complex tasks.</p>
      <p>▪ Integration with other security systems:
Integrating SIEMs with other security tools and
systems, such as intrusion detection and
prevention systems (IDS/IPS), antiviruses, and
identity and access management systems, can
provide deeper security analysis and help
detect sophisticated attacks.</p>
      <p>▪ Continuous improvement: Analyzing
incidents and responses to them helps to
identify weaknesses in security systems and
response processes, allowing for the necessary
adjustments to be made to improve
performance. This is a continuous
improvement process that helps to strengthen
protection and reduce the risk of future
incidents.</p>
      <p>▪ Collaboration and information sharing:
SIEM can facilitate collaboration between
CSIRT team members and other stakeholders
by providing shared access to incident
information, analytics, and reports. Also,
sharing threat information with other
organizations and communities can help
develop better defense strategies.</p>
      <p>▪ Compliance with legal and regulatory
requirements: Using an SIEM helps companies
meet legal and regulatory requirements by
providing the necessary reporting, auditing,
and monitoring of security standards.</p>
      <p>
        ▪ Incident prediction and prevention:
Analyzing the data collected and processed by
an SIEM can help predict potential threats and
develop strategies to prevent them, reducing
the likelihood of future incidents [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1–5</xref>
        ].
      </p>
      <p>SIEM operation is based on the use of
databases (DBs), i.e. structured data stored
digitally in a computer system. The DB is
managed by a database management system
(DBMS). The data, together with the DBMS and
associated applications, form a DB system.
Modern types of DBs usually store data in the
form of tables, where information is presented
in the form of rows and columns. This
information can be easily managed, added,
edited, deleted, updated, monitored, etc. Most
modern DBs use a structured query language
(SQL) to enter records and retrieve
information.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review and Problem Statement</title>
      <p>
        There are many modern DB types [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6–8</xref>
        ]. The
choice of DB type for a particular SIEM system is
determined by the specifics of how the data will
be used in a particular context. Templates and
structures used to organize information in DBMS
are referred to as DB types [
        <xref ref-type="bibr" rid="ref10 ref6 ref7 ref8 ref9">6–10</xref>
        ].
      </p>
      <p>
        Studies [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11–13</xref>
        ] have analyzed modern DB
types used in SIEM systems to identify their
strengths and weaknesses and have
systematized them as follows:
      </p>
      <sec id="sec-2-1">
        <title>2.1. The simplest DB types</title>
        <p>Let's start by looking at three types of DBs
that are still found in specialized environments
but have been largely replaced by reliable and
efficient alternatives.</p>
        <p>2.1.1. Simple data structures. The first and
simplest way of storing data is in text files. This
method is still used today for working with
small amounts of information. A special
character is used to separate fields: a comma or
semicolon in CSV files, and a colon or space in
*nix-like systems.</p>
        <p>2.1.2. Hierarchical DBs. Unlike text tables,
the next type of DBe has relationships between
objects. In hierarchical DBs, each record has an
ancestor. This creates a tree structure in which
records are classified according to their
relationship to a lower level of the record chain,
the structure of hierarchical DBs.</p>
        <p>2.1.3. Network DBs. Network DBs extend the
functionality of hierarchical DBs by allowing
records to have more than one ancestor. This
means that you can model complex
relationships and the structure of network DBs.
2.2. Relational DBs
2.2.1. SQL. Relational DBs are the oldest and
still widely used general-purpose DBs. Data in
relational DBs is structured in the form of
tables, which are made up of columns and
rows. Each column in the table has its own
name and data type. Each row represents a
separate record or item of information in the
table, containing the value for each of the
columns.</p>
        <p>2.2.2. OLTP. OLTP is designed to perform
business transactions performed by multiple
users, the structure of the OLTP database.</p>
        <p>
          Relational DBs are used by the following
SIEM systems IBM QRadar, AlienVault USM,
LOGRHYTHM, AlienVault OSSIM, Splunk,
FortiSIEM, Wazuh, SolarWinds, ManageEngine,
RuSIEM, Prelude OSS, Prelude SIEM, Sagan,
Maxpatrol, EventTracker, Trustwave SIEM
Enterprise, McAfee (ESM) [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16">11–16</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. NoSQL DBs</title>
        <p>NoSQL is a group of DB types that offer
approaches other than the standard relational
model. NoSQL stands for "non-SQL" or "not
only SQL" to indicate that SQL-like queries are
sometimes allowed. A NoSQL non-relational
database allows you to store and process
unstructured or semi-structured data (unlike a
relational DB, which defines the structure of
the data it contains). The popularity of NoSQL
is growing as web applications proliferate and
become more complex.</p>
        <p>2.3.1. Key-value DBs. To store information
in a key-value DB, you need to specify a key and
a data object to store. For example, a JSON
object, an image, or text. To retrieve data, the
key is sent and a blob, a NoSQL structure, is
received.</p>
        <p>
          2.3.2. Document-oriented DBs.
Documentoriented DBs (document DBs or document
repositories) share the basic semantics of
accessing and searching key and value stores.
Such DBs also use a key to uniquely identify
data. The difference between key-value stores
and document DBs is that document DBs store
data in structured formats: JSON, BSON, or
XML, the structure of a document DB – rather
than in blocks [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ].
        </p>
        <p>2.3.3 Graph DBs. Instead of representing
relationships using tables and foreign keys,
graph DBs establish relationships using nodes,
edges, and properties. Graph DBs represent
data in terms of individual nodes, which can
have any number of properties associated with
them. Graph DBs store data in the context of
entities and relationships between entities.</p>
        <p>2.3.4. Columnar DBs. Columnar DBs, also
known as non-relational columnar storage or
wide column DBs, belong to the category of
NoSQL systems but look like relational DBs.
Similar to relational DBs, columnar DBs store
data in the form of rows and columns, but have
a different structure of relationships between
elements. In relational databases, all rows must
follow a fixed schema. The schema determines
which columns will be in the table, their data
types, and other characteristics. Columnar DBs,
on the other hand, have structures called
"column families" instead of tables. Column
families contain rows, each of which can have its
format. Each row consists of a unique identifier
used for searching, followed by a set of column
names and values.</p>
        <p>2.3.5. Time series DBs. Such DBs are
designed to collect and manage items that
change over time. Most such DBs are organized
into structures that record values for an
element. For example, you might create a table
to track the temperature of a processor. Inside,
the values consist of a time stamp and a
temperature value.</p>
        <p>NoSQL is used by the following SIEMs:
AlienVault USM, AlienVault OSSIM, MozDef,
Maxpatrol, and SearchInform SIEM.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Combined DBs</title>
        <p>NewSQL and multi-model DBs are different
types of databases, but they aim to solve a
common set of problems that arise from using
opposing SQL or NoSQL strategies.</p>
        <p>
          2.4.1. NewSQL DBs. NewSQL inherits the
relational structure and semantics but is built
using more modern, scalable designs. The goal
is to provide greater scalability than relational
DBs and higher consistency guarantees than
NoSQL. The trade-off between consistency and
availability is a fundamental problem in
distributed DBs, described by the CAP theorem
[
          <xref ref-type="bibr" rid="ref19">19, 20</xref>
          ].
        </p>
        <p>2.4.2. Multi-model DBs. Such DBs combine
the functionality of several types of DBs. The
advantages of this approach are the following.
The same system can use different
representations for different types of data.</p>
        <p>Combining data from different types of DBs in
one system allows new operations that would
otherwise be difficult or impossible [21].</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.5. Object-oriented DBs</title>
        <p>Information in an object-oriented database
(OODB) is represented as an object, as in
object-oriented programming.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.6. Cloud DBs</title>
        <p>
          A cloud DB is a set of structured or
unstructured data hosted on a private, public, or
hybrid cloud computing platform [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15">11–15, 22–
24</xref>
          ]. There are two types of cloud DB models:
traditional DB and DB as a service (DBaaS). In
the DBaaS model, administrative tasks and
maintenance are performed by a cloud
provider, the structure of cloud DBs [25, 26].
        </p>
        <p>Cloud DB types use the following SIEM: HPE
ArcSight Splunk Ixia ThreatARMOR, Micro
Focus ArcSight, and Trustwave SIEM
Enterprise.</p>
        <p>
          The results of the analysis of DBMSs used in
different SIEM systems according to [
          <xref ref-type="bibr" rid="ref11 ref12 ref13">11–13</xref>
          ]
are shown in Table 1.
        </p>
        <sec id="sec-2-5-1">
          <title>IBM QRadar LogRhythm</title>
        </sec>
        <sec id="sec-2-5-2">
          <title>Splunk</title>
          <p>SIEM
According to the analysis in Table 1, each
specific type of DBMS remains relevant in its
own area, where the relationships between
data are determined by the specific structure
of the DB. Attention should be paid to the
possibility of using hybrid DBs that combine
different types of DBMS, such as SQL and
NoSQL, which will allow maintaining
convenience in storing and classifying data, as
well as ensuring high speed in obtaining large
amounts of information due to preliminary
indexing. In addition, the need to develop a
new data store model has been substantiated,
which, on the one hand, should store, process
and search events in logs at the highest
possible speed, and, on the other hand, should
store service data about users, metadata,
configuration settings, hashed thread counters
and an alert archive in a reliable and structured
manner.</p>
          <p>The aim of the study is therefore to develop
a model of an ontological-relational data store
for use in a cloud-based SIEM system. To
achieve the mentioned aim the
correlationregression multifactor analysis will be used.
3. Model Development: DBs</p>
          <p>Correlation Analysis and</p>
          <p>Defining
Ontological-Relational Data Store (Fig. 1) is a
data management system that combines the
concepts of ontologies and relational DBs to
store and manage information. This paradigm
combines two main ideas [27]:</p>
          <p>Ontology: An ontology is a formally defined
knowledge model or semantic framework used
to describe objects, concepts, and their
interactions in a particular domain. Ontologies
define the semantics of data and help to
understand how data is related to each other.</p>
          <p>Relational data: Relational Databases
(RDBMS) are powerful systems for storing
data in the form of tables with relational
relationships between them. They use the SQL
query language to access data.</p>
          <p>In an ontological-relational data store,
information is stored in the form of tables (as
in relational DB) but is additionally
accompanied by an ontology that provides a
semantic context for the data. The ontology
helps to understand the meaning of data, its
relationships, and context in a broader sense.</p>
          <p>This makes it easier to search, filter, and
understand data. Ontology-relational data
stores are used in various fields, including the
semantic web, biology, medicine, geography,
and other areas where it is important to
understand the semantic context of data, as
well as to efficiently access large amounts of
data.
To support the selection of the most effective
DBs used in modern SIEM systems, let’s use the
procedure of correlation and regression
multifactor analysis, which includes the
following stages:</p>
          <p>Stage 1: Analysis of existing factors and
justification of the type of regression
model. A general list of factors is made and
their possible numerical characteristics for
quantitative and qualitative representation are
determined. An analytical expression is
constructed to reflect the relationship between
the factor and the resulting characteristics of
the function:</p>
          <p>Yˆ = f (x1, x2 , x3 ,..., xd ),
(1)
where Yˆ is an effective feature function;
x1, x2 , x3 ,..., xd are factor attributes of DB.</p>
          <p>In addition, the multiple regression
equation can be represented in a linear form:</p>
          <p>Yˆ = a0 + a1x1 + a2 x2 + ... + ad xd , (2)
where a0 , a1,..., ad are the parameters of the
equation to be determined.</p>
          <p>If d values of yh , x1h , x2h ,, xdh , are known
for each DB factor and the resulting trait,
h =1, 2,, m , then using the standard least
squares method a system of linear algebraic
equations is obtained to estimate the
parameters of the regression equation:
 m m m m
a0m + a1 x1 j + a2  x2 j + ... + ad  xdj =  y j ;
 j=1 j=1 j=1 j=1
 m m m m m
a0  x1 j + a1 x12j + a2  x1 j x2 j + ... + ad  x1 j xdj =  x1 j y j ;
 j=1 j=1 j=1 j=1 j=1
 ...
 m m m m m
a0  xdj + a1 xdj x1 j + a2  xdj x2 j + ... + ad  xd2j =  xdj y j .
 j=1 j=1 j=1 j=1 j=1
The resulting system of d +1 equations
with а0 , а1,, аd of unknowns can be solved by
means of linear algebra. For many equations, it
is best to use the Gaussian method with the
choice of the main element. Since the matrix of
this system of linear algebraic equations is
symmetric, its solution always exists and is
unique. If the number of equations is small, the
inverse matrix method can be successfully
used to solve the problem.</p>
          <p>Stage 2. Verification of the adequacy of
the model obtained. For this purpose, it will be
necessary to do the calculation of</p>
          <p>- the model residuals, i.e. the differences
between the observed and calculated values:
(4)
uh = yh – yˆh = yh – (а0 + а1x1h + а2 x2h +  + аd xdh ), h = 1, 2,, m;
▪ the relative error of the residuals and
their mean:
▪ root mean square
disturbance variance:
error
of the
(3)
(6)
(7)
(8)
m
u  h
 h = h 100%,  = h=1
yh m
;
 ( yh − y)2
▪ the multiple correlation coefhf=i1cient R ,
which is the main indicator of the density of the
correlation between the generalized indicator and
the factors:
m
uh2
R2 = 1 − m h=1
(5)
m 2
 uh
 u = h=1 ;</p>
          <p>m − d −1
▪ coefficient of determination:
m
 ( yh − yˆ)2
або R2 = 1 − hm=1 ;
 ( yh − y)2
h=1
m
 ( yh − yˆh )2
R = 1 − h=1
m
 ( yh − y )2
h=1
.
All values of the correlation coefficient R are
in the interval from -1 to 1. The sign of the
coefficient indicates the "direction" of the
relationship: a positive value indicates a
"direct" relationship, a negative value indicates
an "inverse" relationship, and a value of "0"
indicates the absence of a linear correlation.</p>
          <p>When R =1 or R = −1 , we have a functional
relationship between the features. The
m
 ( yˆh − y )2
h=1</p>
          <p>d
F = m
 ( yh − yˆh )2
h=1
m − d −1
multiple correlation coefficient R is the main
measure of the closeness of the relationship
between the resultant trait and the set of factor
traits.</p>
          <p>Stage 3. Verification of the statistical
significance of the results. The experiment is
performed using Fisher's statistics with d and
( m − d −1) degrees of freedom:
or F =</p>
          <p>R2
1 − R2

m − d −1
d
,
(9)
where d is several DB factors, included in the
model; m is a total number of observations; yˆh
is the estimated value of the dependent
variable at the h-th observation; y is an
average value of the dependent variable; yh is
the value of the dependent variable at the h-th
observation; R is a multiple correlation
coefficient.</p>
          <p>Fisher's tables are used to find the critical
value of Fкр with d and ( m − d −1) degrees of
freedom. If F  Fкр , this indicates that the
model is adequate. If the model is inadequate,
it is necessary to return to the model-building
stage and possibly introduce additional factors
or move to a non-linear model.</p>
          <p>Stage 4. Determining the regression
coefficients, the elasticity coefficient, and the
confidence intervals for the regression
parameters. It is necessary to verify the
significance of the coefficients of the
regression equation. The test is performed
using the t-statistic, which has the form for
multivariate regression parameters:</p>
          <p>a
th = 2h , 10)</p>
          <p>ah
where ah is the standard deviation of the
estimate of the hth parameter.</p>
          <p>If the th value exceeds the critical value
found in the Student's t-test tables, the
corresponding parameter is statistically
significant and has a significant impact on the
generalizing indicator.</p>
          <p>Differences in the units of measurement of
the DB factors are eliminated by using partial
elasticity coefficients given by the ratio:
 h = yˆ  xh , (11)</p>
          <p>xh y
where xh is the average value of the h-th
parameter; y is the average value of the
resultant trait.</p>
          <p>The partial</p>
          <p>elasticity coefficient  h
indicates how much the outcome variable
changes on average for a 1% change in factor
xh ,while holding other parameters constant.</p>
          <p>The confidence interval at the level of
reliability (1-ɑ) is an interval with randomly
determined boundaries that covers the true
value of the coefficient of the regression
equation ah with the level of confidence (1-ɑ)
and is specified by the dependencies:
(ah − tɑ /2,z2ah ; ah + tɑ /2,za2h ),
(12)
where tɑ /2,z is Student's statistic with
z = m − d −1 degrees of freedom and ɑ as
significance level; a2h is a standard deviation of
the аh parameter estimate.</p>
          <p>Therefore, consider s random variables of
x1, x2 ,, xr ,, xs (the parameters under study)
represented
by
samples
of
v
values
xr = xr1, xr2 ,, xrz ,, xrv . For each pair of
random variables xr and xw , the empirical
linear correlation coefficient rrw can be
estimated from the equation. The values of the
coefficients obtained are written in a matrix of
size s  s :
 r121 r112 ...... rr12ww ...... rr12ss 
 ... ... ... ... ... ... 
  .
 rr1 rr2 ... 1 ... rrs 
 ... ... ... ... ... ... 
 rs1 rs2 ... rsw ... 1 
(13)</p>
          <p>All values of the correlation coefficient
range from -1 to 1. The sign of the coefficient
indicates the 'direction' of the relationship: a
positive value indicates a 'direct' relationship,
a negative value indicates an 'inverse'
relationship, and a value of '0' indicates the
absence of a linear correlation. When r =1or
r = −1, when there is a functional relationship
between the attributes.</p>
          <p>The multiple correlation coefficient is the
main indicator of the closeness of the
relationship between a set of databases DB and
a set of factor attributes (main selection
criteria) EC. The Chaddock scale is used to
assess the strength of the relationship.</p>
          <p>Thus, using the above calculation procedure
of correlation and regression analysis, we have
examined the set of DB used in modern SIEM
systems according to the main EC criteria.</p>
          <p>
            In Table 1, the set of DB defined in the works
[
            <xref ref-type="bibr" rid="ref11 ref12 ref13">11–13, 28</xref>
            ] is presented in the following form:
          </p>
          <p>n
DB = {UDBi} = DB1, DB2 ,, DBn, (14)</p>
          <p>i=1
where DBi  DB (i = 1, n) are types of DBMS
used in certain SIEM systems, n is a total
number of databases.</p>
          <p>According to the analyzed systems, with
n = 34 , considering (14), let’s define the set of
databases as follows:</p>
          <p>34
DB = {UDBi} = DB1, DB2 ,, DB34,</p>
          <p>i=1
where DB1 – Ariel database, DB2 – PostgreSQL,
DB3 – SQLite, DB4 – Oracle, DB5 – SQL Server,
DB6 – MySQL, DB7 – DB2/Linux, DB8 –
Informix, DB9 – MemSQL, DB10 – AWS Aurora,
DB11 – Microsoft SQL Server, DB12 – AWS
RedShift, DB13 – SAP SQL Anywhere, DB14 –
Sybase ASE, DB15 – Sybase IQ, DB16 – Teradata,
DB17 – MSSQL, DB18 – Data Access Server
(DAS), DB19 – DB2/UDB, DB20 – RedisDB, DB21
– Rap Sheet, DB22 – RabbitMQ, DB23 –</p>
          <p>MongoDB, DB24</p>
          <p>– ElasticSearch, DB25 –
Kibana, DB26 – MS SQL Express, DB27 –
MariaDB, DB28 – SQL, DB29 – DB2, DB30 – Own
development CORR-E, DB31 – Microsoft SQL
Azure, DB32 – SYBASE, DB33 – IBM, DB34 –</p>
        </sec>
        <sec id="sec-2-5-3">
          <title>Hadoop as it advised in [12]. The DB set of databases was studied according to the main criteria introduced by the EC set:</title>
          <p>q
EC = {UEC j } = EC1, EC2 ,, ECq, (15)
j=1
where EC j  EC ( j = 1, q) is a category of
criteria for evaluating the most efficient
DBMSs, q is a total number of criteria.</p>
          <p>Therefore, for q = 7 , considering (15), let’s
define the set of proposed criteria EC:</p>
          <p>7
EC = {UECq} = {EC1, EC2 , EC3, EC4 , EC5, EC6 , EC7} =
j=1
= {ECHOS , ECFL , ECFS , ECSTD , ECSCD , ECSSQL , ECDBAAS },
where EC1 is a highly organized structure, EC2
is flexible, EC3 is quick access, EC4 is a
support for different types of data, EC5 is
saving configuration data possibility, EC6 is a
structured query language support, EC7 is the
DBaaS (supported for cloud technologies).</p>
          <p>As a result, the two most effective databases
were studied and reasonably identified by the
correlation-regression multifactor analysis
(scheme presented at Fig.2):
1) DB23 which is MongoDB;
2) DB24 , which is ElasticSearch.
4. Ontology-Relational Data Store</p>
          <p>Model Development
The ontology-relational data store model
consists of two types of DBs:
1) DB Type 1
Purpose: fast processing of logs.</p>
          <p>To solve this problem, we chose the
opensource Elasticsearch technology. Elasticsearch
is perfectly designed to work with logs. After
indexing, it is possible to search, sort, and filter
data, not just columns of data. This again
demonstrates a different approach to data
retrieval and shows that Elasticsearch can
perform complex full-text searches.</p>
          <p>Documents are represented as JSON
objects. JSON serialization (the process of
translating any data structure into a sequence
of bits) is supported by most programming
languages and is a standard format for NoSQL.</p>
          <p>Elasticsearch is an open-source full-text
search platform based on the Lucene library
and written in Java. It is designed to perform
complex document/file-based searches. In the
Elasticsearch tables are called indexes and the
process of loading documents is called
indexing. It can be thought of as both a
nonrelational document store in JSON format and
a search engine based on Lucene's full-text
search. The official clients are available in Java,
NET (C#), Python, Groovy, JavaScript, PHP, Perl
and Ruby. Elasticsearch is developed by Elastic
and distributed under an open license. The
Java code has been modified for the current
model.</p>
          <p>2) DB Type 2</p>
          <p>Purpose: reliable storage of proprietary
information.</p>
          <p>To achieve this, we chose the open-source
MongoDB technology. MongoDB is a
document-oriented DBMS that does not
require a table schema description. It is
considered one of the classic examples of a
NoSQL system that uses JSON-like documents
and a DB schema. Written in C++, it is used in
web development, particularly as part of the
JavaScript-oriented MEAN stack.</p>
          <p>The system can work with a set of replicas,
i.e. two or more copies of data on different
nodes. Each instance of the replica set can act
as a primary or secondary replica at any time.
By default, all write and read operations are
performed on the primary replica. Auxiliary
replicas keep the data copy up to date. If the
primary replica fails, the replica set selects
which replica should become the primary.
Secondary replicas can also be a source for
read operations.</p>
          <p>The system is scaled horizontally using the
technique of segmenting DB objects
distributing their parts to different cluster
nodes. The administrator chooses the
segmentation key, which determines the
criteria by which data is distributed to the
nodes (depending on the hash values of the
segmentation key). The fact that any node in
the cluster can receive requests ensures load
balancing. The system can be used as a file
storage with load balancing and data
replication (Grid File System function). In
addition, software tools are provided for
working with files and their contents. GridFS is
used in plugins for Nginx and lighttpd. GridFS
splits a file into chunks and stores each chunk
as a separate document. It is released under
the AGPL open-source license.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Implementation of the Proposed Model</title>
      <p>
        The model proposed in this paper (Figure 1)
can be implemented as part of an event
correlation and cybersecurity incident
management system [
        <xref ref-type="bibr" rid="ref13">13, 29</xref>
        ]. When scaling
resources in the developed SIEM, there are
several practical rules:
      </p>
      <p>▪ SIEM nodes focus on processor
performance. They also serve as the user interface
for the browser;</p>
      <p>▪ Elasticsearch nodes should have as much
RAM as possible and the fastest discs that can be
used. It all comes down to I/O speed;</p>
      <p>▪ MongoDB stores meta-information and
configuration data and is not resource-intensive;
▪ received messages are only stored in
Elasticsearch.</p>
      <p>The main task of an ontological-relational
data store for SIEM is to combine the operation
of two types of DBs while preserving the
possibility of clustering DBs of both types.</p>
      <p>The proposed approach to organizing the
operation of the ontological-relational data
store model for an SIEM system allows the
indexing service to access external data stores
(with the data being correctly indexed and
correctly displayed during searches), to scale
(cluster) with the growth of data volume, to
support work with different queries (simple,
complex, structured) and with different types
of data, to allow aggregation, analysis,
collection of entities, patterns, simplification of
searches and high search speed.</p>
      <p>In addition, a SIEM based on this model can
work with a set of replicas (i.e. contain 2 or
more copies of data on different nodes), scale
horizontally using the technique of segmenting
DB objects, and be used as file storage with
load balancing and data replication (Grid File
System function).</p>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
      <p>This paper has analyzed the modern types of
DBs used in SIEM systems and shown that each
type of DB remains relevant in its area, where
the relationships between data are determined
by the specific structure of the DBMS. When
choosing a DB to build an SIEM system, it is
important to consider factors such as ease of
data storage, speed of data retrieval, and ease
of use. It is also worth considering the
possibility of integration with other system
modules and external APIs to support a variety
of DB for most DPI systems (comprehensive
deep inspection content analyzers), both
software and hardware. In addition, the
possibility of using hybrid DBs that combine
different types, such as SQL and NoSQL, should
be considered.</p>
      <p>A model of an ontological-relational data
store has been developed. It uses two different
DBs, Elasticsearch and MongoDB, with
appropriate characteristics, and allows to
improve the convenience of storing and
classifying data, as well as to ensure high speed
of obtaining large amounts of information
through pre-indexing, horizontal scaling by
segmenting DB objects, as well as load
balancing and data replication [30].
Applications, IDAACS (2023) 1037–
1041.
[29] A. Tikhomirov, et al., Network Society:
Aggregate Topological Models,
Communications in Computer and Information
Science, Verlag: Springer International
Publ, vol. 487 (2014) 415–421.
[30] M. Nabil, et al., SIEM Selection Criteria
for an Efficient Contextual Security, in:
International Symposium on Networks,
Computers and Communications,
Marrakech, Morocco (2017) 1–6, doi:
10.1109/ISNCC.2017.8072035.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Anakhov</surname>
          </string-name>
          , et al.,
          <article-title>Increasing the Functional Network Stability in the Depression Zone of the Hydroelectric Power Station Reservoir</article-title>
          ,
          <source>in: Workshop on Emerging Technology Trends on the Smart Industry and the Internet of Things</source>
          , vol.
          <volume>3149</volume>
          (
          <year>2022</year>
          )
          <fpage>169</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Anakhov</surname>
          </string-name>
          , et al.,
          <article-title>Evaluation Method of the Physical Compatibility of Equipment in a Hybrid Information Transmission Network</article-title>
          ,
          <source>Journal of Theoretical and Applied Information Technology</source>
          <volume>100</volume>
          (
          <issue>22</issue>
          ) (
          <year>2022</year>
          )
          <fpage>6635</fpage>
          -
          <lpage>6644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokolov</surname>
          </string-name>
          , et al.,
          <article-title>Method for Increasing the Various Sources Data Consistency for IoT Sensors</article-title>
          , in: IEEE 9th International Conference on Problems of Infocommunications, Science and Technology (
          <year>2023</year>
          )
          <fpage>522</fpage>
          -
          <lpage>526</lpage>
          . doi:
          <volume>10</volume>
          .1109/PICST57299.
          <year>2022</year>
          .
          <volume>10238518</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hulak</surname>
          </string-name>
          , et al.,
          <source>Dynamic Model of Guarantee Capacity and Cyber Security Management in the Critical Automated Systems, in: 2nd International Conference on Conflict Management in Global Information Networks</source>
          , vol.
          <volume>3530</volume>
          (
          <year>2022</year>
          )
          <fpage>102</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Astapenya</surname>
          </string-name>
          , et al.,
          <article-title>Analysis of Ways and Methods of Increasing the Availability of Information in Distributed Information Systems</article-title>
          , in: IEEE 8th International Conference on Problems of Infocommunications, Science and Technology (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1109/picst54195.
          <year>2021</year>
          .
          <volume>9772161</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vielberth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pernul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Security</given-names>
            <surname>Information</surname>
          </string-name>
          and Event Management Pattern,
          <source>in: 12th Latin American Conference on Pattern Languages of Programs</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Makwana</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Study on Critical Capabilities for Security Information</article-title>
          and
          <string-name>
            <given-names>Event</given-names>
            <surname>Management</surname>
          </string-name>
          ,
          <source>Int. J. Sci. Res</source>
          .
          <volume>4</volume>
          (
          <issue>7</issue>
          ) (
          <year>2015</year>
          )
          <fpage>1893</fpage>
          -
          <lpage>1896</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Karlzén</surname>
          </string-name>
          ,
          <article-title>An Analysis of Security Information and Event Management Systems</article-title>
          . Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg, Göteborg, Sweden (
          <year>2009</year>
          ). http://publications.lib.chalmers.se/reco rds/fulltext/89572.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ribolovlev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karasov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Polyakov</surname>
          </string-name>
          ,
          <article-title>Classification of Emergency Management Systems for Incidents without Baking</article-title>
          ,
          <source>Food Cyber Secur</source>
          .
          <volume>3</volume>
          (
          <issue>27</issue>
          ) (
          <year>2018</year>
          )
          <fpage>47</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Ariel</given-names>
            <surname>Query Language Guide</surname>
          </string-name>
          ,
          <source>IBM QRadar 7.3</source>
          .3. https://www.ibm.com/ docs/en/SS42VS_7.3.3/com.ibm.qradar.d oc/b_qradar_aql.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gnatyuk</surname>
          </string-name>
          , et al.,
          <source>Modern Types of Computers and Communications Databases for SIEM System (ISNCC)</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , doi: 10.1109/ Development, in: Cybersecurity ISNCC.
          <year>2017</year>
          .
          <volume>8072035</volume>
          . Providing in Information and [20]
          <string-name>
            <given-names>R.-V.</given-names>
            <surname>Mahmoud</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>DefAttTelecommunication Systems</surname>
            <given-names>II</given-names>
          </string-name>
          , vol.
          <volume>3187</volume>
          Architecture of Virtual Cyber Labs for (
          <year>2021</year>
          )
          <fpage>127</fpage>
          -
          <lpage>138</lpage>
          . Research and Education, in:
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gnatyuk</surname>
          </string-name>
          , et al.,
          <source>Model of Information International Conference on Cyber Technology for Efficient Data Processing Situational Awareness Data Analytics in Cloud-based Malware Detection and Assessment (CyberSA)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          ,
          <source>Systems of Critical Information</source>
          <year>2021</year>
          . Infrastructure, in: Cybersecurity [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Danik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hryschuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gnatyuk</surname>
          </string-name>
          ,
          <source>Providing in Information and Synergistic Effects of Information and Telecommunication Systems</source>
          , vol.
          <volume>3421</volume>
          Cybernetic Interaction in Civil Aviation, (
          <year>2023</year>
          )
          <fpage>206</fpage>
          -
          <lpage>213</lpage>
          . Aviat.
          <volume>20</volume>
          (
          <issue>3</issue>
          ) (
          <year>2016</year>
          )
          <fpage>137</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gnatyuk</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Event</surname>
            Correlation and [22]
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Berdibayev</surname>
          </string-name>
          , et al.,
          <article-title>A Concept of the Incident Management System for Architecture and Creation for SIEM Cybersecurity of Critical Infrastructure System in Critical Infrastructure, Stud</article-title>
          . Objects, Cybersecur. Edu. Sci. Technol.
          <source>Syst. Decis. Control</source>
          <volume>346</volume>
          (
          <year>2021</year>
          )
          <fpage>221</fpage>
          -
          <lpage>242</lpage>
          . 3(
          <issue>19</issue>
          ) (
          <year>2023</year>
          )
          <fpage>176</fpage>
          -
          <lpage>196</lpage>
          . [23]
          <string-name>
            <given-names>O.</given-names>
            <surname>Oksiiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaikovska</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Fesenko,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kandasamy</surname>
          </string-name>
          ,
          <article-title>Profiling Security Technique for Authentication SIEM Tools and Correlation Engines for Process in the Cloud Environment</article-title>
          , in: Security Analytics, in: International IEEE International Scientific-Practical Conference on Wireless Communications, Conference Problems of InfocomSignal Processing and Networking, munications,
          <source>Science and Technology Chennai</source>
          ,
          <string-name>
            <surname>India</surname>
          </string-name>
          (
          <year>2017</year>
          )
          <fpage>717</fpage>
          -
          <lpage>721</lpage>
          . (
          <year>2019</year>
          )
          <fpage>379</fpage>
          -
          <lpage>382</lpage>
          , doi: 10.1109/ doi: 10.1109/WiSPNET.
          <year>2017</year>
          .
          <volume>8299855</volume>
          . PICST47496.
          <year>2019</year>
          .
          <volume>9061248</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          , et al.,
          <source>Toward the SIEM</source>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gnatyuk</surname>
          </string-name>
          , et al.,
          <article-title>Studies on Cloud-based Architecture for Cloud-based Security Cyber Incidents Detection</article-title>
          and Services, in: IEEE Conference on
          <article-title>Identification in Critical Infrastructure, Communications and Network Security in: Cybersecurity Providing in (CNS), Las Vegas</article-title>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          (
          <year>2017</year>
          )
          <fpage>398</fpage>
          -
          <lpage>399</lpage>
          . Information and Telecommunication doi:
          <volume>10</volume>
          .1109/CNS.
          <year>2017</year>
          .
          <volume>8228696</volume>
          .
          <string-name>
            <surname>Systems</surname>
          </string-name>
          , vol.
          <volume>2923</volume>
          (
          <year>2021</year>
          )
          <fpage>68</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Bachane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. I. K.</given-names>
            <surname>Adsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Adsi</surname>
          </string-name>
          , Real [25]
          <string-name>
            <given-names>N.</given-names>
            <surname>Lukova-Chuiko</surname>
          </string-name>
          , et al.,
          <article-title>Threat Hunting Time Monitoring of Security Events for as a Method of Protection Against Cyber Forensic Purposes in Cloud Threats, in: Information Technology and Environments using SIEM</article-title>
          ,
          <source>in: 3rd Interactions</source>
          , vol.
          <volume>2833</volume>
          (
          <year>2021</year>
          )
          <fpage>103</fpage>
          -
          <lpage>113</lpage>
          . International Conference on Systems of [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yushko</surname>
          </string-name>
          , et al.,
          <source>Shielding Web Collaboration (SysCo)</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          , doi: Application against Cyber-Attacks using
          <volume>10</volume>
          .1109/SYSCO.
          <year>2016</year>
          .7831327. SIEM, in: 13th International Conference
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>AlSabbagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kowalski</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Framework on Advanced Computer Information and Prototype for A Socio-Technical Technologies (ACIT), Wrocław</article-title>
          ,
          <source>Poland Security Information and Event</source>
          (
          <year>2023</year>
          )
          <fpage>393</fpage>
          -
          <lpage>396</lpage>
          , doi: 10.1109/
          <string-name>
            <surname>Management</surname>
          </string-name>
          <article-title>System (ST-SIEM)</article-title>
          ,
          <source>in: ACIT58437</source>
          .
          <year>2023</year>
          .
          <volume>10275630</volume>
          . European Intelligence and Security [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          , et al.,
          <source>Data Consistency Informatics Conference (EISIC)</source>
          (
          <year>2016</year>
          )
          <article-title>Management in an Open Smart Home 192-195</article-title>
          , doi: 10.1109/EISIC.
          <year>2016</year>
          .
          <volume>049</volume>
          .
          <string-name>
            <surname>Management Platform</surname>
          </string-name>
          (
          <year>2014</year>
          ). doi:
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Serckumecka</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bessani</surname>
          </string-name>
          ,
          <volume>10</volume>
          .1109/EMS.
          <year>2014</year>
          .51.
          <article-title>Low-Cost Serverless SIEM in the Cloud</article-title>
          , [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Polozhentsev</surname>
          </string-name>
          , et al.,
          <source>Novel Cyber in: 38th Symposium on Reliable Incident Management System for 5GDistributed Systems (SRDS)</source>
          (
          <year>2019</year>
          ).
          <article-title>doi: based Critical Infrastructures</article-title>
          ,
          <source>in: IEEE 10.1109/SRDS47363</source>
          .
          <year>2019</year>
          .00057. International Conference on Intelligent
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nabil</surname>
          </string-name>
          , et al.,
          <article-title>SIEM Selection Criteria Data Acquisition and Advanced for an Efficient Contextual Security</article-title>
          ,
          <source>in: Computing Systems: Technology and International Symposium on Networks,</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>