<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Analysis of Internet Service Log Data to Assess the Level of Cyber-threats in the Corporate Network *</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sergey</forename><surname>Isaev</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Institute of Computational Modelling</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">Russian Academy of Sciences</orgName>
								<address>
									<addrLine>50/44 Akademgorodok</addrLine>
									<postCode>660036</postCode>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dmitry</forename><surname>Kononov</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Institute of Computational Modelling</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">Russian Academy of Sciences</orgName>
								<address>
									<addrLine>50/44 Akademgorodok</addrLine>
									<postCode>660036</postCode>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrey</forename><surname>Malyshev</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Institute of Computational Modelling</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">Russian Academy of Sciences</orgName>
								<address>
									<addrLine>50/44 Akademgorodok</addrLine>
									<postCode>660036</postCode>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Analysis of Internet Service Log Data to Assess the Level of Cyber-threats in the Corporate Network *</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CBC3F5C295220AF92577E56D617BD0B1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T19:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Cyber-Threats</term>
					<term>Security</term>
					<term>Data Analysis</term>
					<term>Log</term>
					<term>Internet</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The article describes log analysis of Internet services of the Krasnoyarsk Science Center (Russia). The importance of log analysis as a method to improve the effectiveness of network security is shown. Data sources are described. The study examines the following systems: Netflow IP traffic, intrusion prevention system, corporate mail server, web server. The log data was used to distinguish the frequency of events and to identify malicious behavior. The article describes security threats identified during the analysis of logs. The analysis results allow optimizing protection systems against network attacks. Measures taken to improve network security are presented.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Development of modern information technologies leads to increasing digitalization level and active use of various Internet services for scientific and business processes.</p><p>The corporate network and services it provides become daily working tools, without which the full functioning of the organization is impossible. In this regard, the tasks of assessing the risks of cybersecurity and the level of cyber-threats for providing adequate protection are becoming more and more relevant. An important aspect of cybersecurity is the study of security logs <ref type="bibr" target="#b0">[1]</ref>. Modern researchers use dynamic methods of analysis since traditional approaches with static metrics may skip intellectual low-frequency attacks <ref type="bibr" target="#b1">[2]</ref>. An important parameter of a secure system is the response time to information security incidents. Minimization of this parameter to the extent of full attack prevention is described in <ref type="bibr" target="#b2">[3]</ref>. Methods and algorithms of mail spam resistance are actively developing <ref type="bibr" target="#b3">[4]</ref>. To provide information security various software is used: security scanners <ref type="bibr" target="#b4">[5]</ref>, complex security analysis systems <ref type="bibr" target="#b5">[6]</ref>, etc. Thus, revealing new cyber-threat signs and analysis methods is an urgent problem.</p><p>For many years, Krasnoyarsk Science Center has been studying the problems of cyber security analysis and ensuring the network protection <ref type="bibr" target="#b6">[7]</ref>. The purpose of this work is to analyze log data on Internet services, identify potential risks, and optimize security protection mechanisms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data Sources</head><p>The corporate network of the Krasnoyarsk Science Center has a four-level architecture: 1) network core which provides routing and connectivity to external networks; 2) server network which hosts Internet services; 3) aggregation level which connects multiple organizations together; 4) local network for end users. All the information about network traffic is collected at the Internet connection points and server network. In addition, there are logs from the main Internet services: corporate mail server, web server, and proxy server for access to external resources. The sources of data analysis are the following: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">IP Traffic Data Analysis</head><p>To assess the permanent threat level, IP traffic to unused network addresses was analyzed (Fig. <ref type="figure" target="#fig_0">1</ref>). The analysis shows the permanent number of access attempts. The daily aggregation shows a trend in the number of access attempts from 5000 to 12000 per day. The detected peak of 60000 on 14.01.2020 is explained by the attack duration rather than intensity, as seen in the hourly aggregation (Fig. <ref type="figure" target="#fig_1">2</ref>). The hourly rate of access attempts to a single address is about 500 per hour, regardless of the time of day and days of week, with the peaks of up to 4500 per hour. Thus, the time interval for confident detection of network attacks should be no more than one hour. The number of unique threat sources per day (Fig. <ref type="figure" target="#fig_2">3</ref>) changes from 1000 to 2000, which indicates the presence of a large and constantly operating network used for scanning Internet services. Building the distribution by hours (Fig. <ref type="figure" target="#fig_3">4</ref>) allows one to calculate the average deviation of about 1.5% with the maximum deviation around 07:00 KRAT of about 5%. The detected maximum corresponds to 00:00 GMT time, which indicates the prevalence of the systems with scheduled scans and attacks launched at midnight GMT. The analysis of scanning frequency of individual services (Fig. <ref type="figure" target="#fig_4">5</ref>) allows identifying popular services, which are the most attacked and under which the threats are masked: Telnet/23, MS SQL/1433, HTTP/80, Personal Agent/5555, SSH/22, HTTP Alternate/8080, RDP/3389. Thus, Telnet and MS SQL can be added to the existing blocking network ports (SSH, RDP, SMTP) to increase the protection efficiency. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Mail Server Data Analysis</head><p>The analysis of the mail traffic reveals its periodicity during both the day and days of week (Fig. <ref type="figure">6</ref>).</p><p>Fig. <ref type="figure">6</ref>. Normalized daily number of emails and viruses detected during the year.</p><p>The number of viruses in the mail by hours in the recipient's time zone (Fig. <ref type="figure" target="#fig_5">7</ref>) shows a good correlation with the number of mail spam (0.63), but time distributions by the sender have zero correlation, which may indicate different sources of mail spam and mail viruses. Using geographical databases and data aggregation by the territory allows building a distribution by the country of spam sources (Fig. <ref type="figure" target="#fig_6">8</ref>). The most active mail spam countries: Russia, USA, Germany, France, China. In terms of the number of virus sources, the United States, France, Russia, and Vietnam are the leaders. The weekend activity and threat level is 2-3 times lower than on workdays, while Tuesday and Wednesday are the highest threat level days. In terms of the time of day, the threat level at night is 2 times lower than during working hours. Intrusion Prevention System Data Analysis Analysis of the Intrusion Prevention System log shows no periodicity both with regard to the days of week and time of day (Fig. <ref type="figure" target="#fig_7">9</ref>). The number of blocked network addresses is approximately 3 times lower than the number of unique scanning sources per day (on average, 500 and 1500, respectively), which indicates that only every third source takes actions leading to its blocking. In the hourly frequency distribution of the blocking system over the SSH and RDP protocols (Fig. <ref type="figure" target="#fig_8">10</ref>), there is a peak (about 150% of the average) at around midnight GMT, which is an additional indicator of a large threat scanning network launched at 00:00 GMT. The largest number of responses (165 thousand) falls on the SSH service, followed by SMTP (about 30 thousand) and RDP (about 3 thousand). The significant predominance of SSH may be due to the specific of the blocking system: all connections from the threat sources are blocked, and SSH has the minimum port number (22) among popular services and, thus, is checked first. The analysis of the geographical location of threat sources shows the leadership of China (39%) and the United States (12%). Thus, it is possible to make a conclusion about purposeful invasion attempts from China, since other data indicates that it does not have a leading position.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">WWW Data Analysis</head><p>The analysis of web services logs shows periodicity both in terms of the days of week and time of day. In addition, there is a tendency for the number of requests associated with the WWW services expansion to increase and an increasing number of visitors is </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SSH,RDP</head><p>Mail services also observed. The request analysis in terms of the country shows the following results: Russia -about 80% of all visits, USA -7%, Germany -2%. The analysis of error logs shows the following: Russia -54%, USA -27%, China -3%. The most popular browsers are Chrome -47%, Firefox -16%, Internet Explorer -6%. Web spiders and bots amount to about 9% of the total number of requests and 32% of the total number of errors.</p><p>When processing web service logs, requests were divided into two non-intersecting groups: legitimate and erroneous requests. Legitimate requests are those that are processed by a web application or web service in normal mode and whose HTTP response code is one of 1XX, 2XX, 3XX. Erroneous requests (or errors) are those that are processed incorrectly either on the client side (HTTP response code 4XX) or on the server side (5XX). The error analysis is important because it allows revealing malicious activity. Figures <ref type="figure" target="#fig_11">11 and 12</ref> show the number of requests and errors per day. As one can see from the graphs, the number of requests and errors depends on holidays since most of the services are used during business hours.  The correlation coefficient for the server and client requests is 0.984 (Fig. <ref type="figure" target="#fig_2">13</ref>). This indicates that most of the requests are carried out from Krasnoyarsk (KRAT) and nearby time zones. The correlation was calculated and queries and errors were normalized by server and client time (Fig. <ref type="figure" target="#fig_12">14</ref>). The high value of the correlation coefficient for errors was found to be caused by incorrect operation of websites and web services. However, the rest of the errors show the presence of scans and attacks performed by web spiders. Since errors indicate the attempts to access non-existent or non-public resources, there is a high probability of an increasing number of threat sources from the United States and China since their portion of errors is several times greater than the portion of requests. The majority of errors are caused by detecting vulnerabilities in popular Content Management Systems (CMS). In addition, the browsers analysis in terms of requests and errors shows an increased percentage of errors from web spiders, which may indicate a high risk of threat. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Measures Taken</head><p>As a result of the research, some measures were taken to increase the security of Internet services of the Krasnoyarsk Science Center. In particular, the following was performed: 1) the threshold time interval for more confident detection of network scanning was increased; 2) new TCP ports to the monitoring system to track malicious activity were added; 3) firewall settings to more effectively blocking unwanted hosts were optimized; 4) web server settings to prevent attacks on the known CMS vulnerabilities was updated; 5) network settings of internal switches were optimized to block unwanted traffic between different divisions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Conclusion</head><p>In this work, we analyzed data logs from the corporate Internet services of the Krasnoyarsk Science Center. The main sources of cybersecurity threats were identified. New signs of threat sources were determined which can be used to improve corporate network security systems. In general, the applied security tools allow detecting and blocking threats at early stages. The results of the study allow optimizing protection systems against network attacks, taking into account the identified sources of threats which were previously not taken into account in standard security tools. The measures taken increase the responsiveness to emerging threats and cybersecurity of the organization as a whole.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Incoming connections to unused network addresses (daily, June 2019 -June 2020).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Incoming connections to unused network addresses (hourly).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Number of unique scan sources per day for April 2020.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Distribution of scans by hours.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Scan rates by service.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 7 .</head><label>7</label><figDesc>Fig. 7. Distribution of spam and viruses by the recipient's time zone.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Fig. 8 .</head><label>8</label><figDesc>Fig. 8. Leading countries of mail spam.</figDesc><graphic coords="5,356.88,363.84,70.08,119.28" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Fig. 9 .</head><label>9</label><figDesc>Fig. 9. Histogram of the number of blocks during the year.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Fig. 10 .</head><label>10</label><figDesc>Fig. 10. Hourly distribution of IPS responses.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Fig. 11 .</head><label>11</label><figDesc>Fig. 11. Number of requests per day during the year.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Fig. 12 .</head><label>12</label><figDesc>Fig. 12. Number of errors per day during the year.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Fig. 14 .</head><label>14</label><figDesc>Fig. 14. Distribution of requests and errors by server and client time.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Number of requests and errors per hour.</figDesc><table><row><cell></cell><cell>Fig. 13.</cell></row><row><cell>200000 250000</cell><cell>Requests</cell></row><row><cell>150000</cell><cell>Trend</cell></row><row><cell>100000</cell><cell></cell></row><row><cell>50000</cell><cell></cell></row><row><cell>0</cell><cell></cell></row><row><cell cols="2">01'19 02'19 03'19 04'19 05'19 06'19 07'19 08'19 09'19 10'19 11'19 12'19 01'20</cell></row><row><cell>6000</cell><cell>Errors</cell></row><row><cell>4000</cell><cell>Trend</cell></row><row><cell>2000</cell><cell></cell></row><row><cell>0</cell><cell></cell></row><row><cell cols="2">01'19 02'19 03'19 04'19 05'19 06'19 07'19 08'19 09'19 10'19 11'19 12'19 01'20</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Discovering and utilising expert knowledge from security event logs</title>
		<author>
			<persName><forename type="first">S</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Parkinson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Security and Applications</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page">102375</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Landauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wurzenberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Skopik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Security</title>
		<imprint>
			<biblScope unit="volume">79</biblScope>
			<biblScope unit="page" from="94" to="116" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Fast attack detection system using log analysis and attack tree generation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Shin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Shin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cluster Computing</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1827" to="1835" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Email Classification Research Trends: Review and Open Issues</title>
		<author>
			<persName><forename type="first">G</forename><surname>Mujtaba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shuib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">G</forename><surname>Raj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Majeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Al-Garadi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="9044" to="9064" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">An Intrusion Action-Based IDS Alert Correlation Analysis and Prediction Framework</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="150540" to="150551" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Towards a system for complex analysis of security events in large-scale networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sapegin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jaeger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Meinel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Security</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="page" from="16" to="34" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Research of network anomalies in the corporate network of the Krasnoyarsk Scientific Center</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kulyasov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Isaev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Siberian Journal of Science and Technology</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="412" to="422" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
