Problem of Developing an Early-Warning Cybersecurity System for Critically Important Governmental Information Assets Sergei A. Petrenko, Alexey S. Petrenko Krystina A. Makoveichuk Information Security Department Department of Informatics and Information Technologies Saint Petersburg Electrotechnical University "LETI" Vernadsky Crimean Federal University St. Petersburg, Russia Yalta, Russia s.petrenko@rambler.ru; a.petrenko@rambler.ru christin2003@yandex.ru Abstract—The article considers possible solutions of to the technologies in big data and deep learning as well as in relatively new scientific-technical problem of developing an early- semantic and cognitive analysis are now capable of warning cybersecurity system for critically important governmental proactively identifying the invader’s hidden meanings and information assets. The solutions proposed are based on the goals, which the other types of analysis could not discover, results of exploratory studies conducted by the authors in the will likely play an instrumental role here. This article aims to areas of Big data acquisition, cognitive information technologies develop these methods and technologies. (cogno-technologies), and “computational cognitivism,” involving II. PROBLEMS OF DEVELOPING AN EARLY- a number of existing models and methods. The results obtained WARNING CYBERSECURITY SYSTEM permitted the design of an early-warning cybersecurity system. At the same time, it is impossible to implement a National Keywords—Сyberspace; critically important infrastructure; Cyberattack Early Warning System without also tackling a information confrontation; hybrid wars; cyberattacks; information series of related issues. Most notably, this will necessarily security; cybersecurity system; early-warning; Big Data; Big Data Analytics; cogno-technologies; computational cognitivism; entail the creation of an effective computing infrastructure that scenarios of an early-warning; synthesize scenarios. provides the implementation of new methods and technologies for modeling the development, prevention and deterrence of I. INTRODUCTION destructive information and technical impacts in real-time, or even preemptively. Clearly, this problem will not be solved Nowadays, the information confrontation plays an without high-performance computing systems or a increasingly important role in modern, “hybrid” wars. supercomputer. Furthermore, victory is often attained not only via military or numerical superiority, but rather by information influence on We must confess that Russia currently lags far behind various social groups or by cyberattacks on critically leading Western countries in terms of its supercomputer important governmental infrastructure. technology. Cluster supercomputers primarily used in our country are usually based on a СKD assembly from In this regard, means for detecting and preventing commercially available foreign processing nodes and network information and technical impacts should play a crucial role. switches. It is well-known that this class of supercomputers Currently, systematic work is being done in Russia to create a demonstrates its optimal performance when solving loosely National Cyberattack Early-Warning System. A number of bound problems not requiring intensive data exchange state and corporate cybersecurity response system centers have between processor nodes. already been organized. The actual performance of cluster supercomputers, However, the technologies applied in these centers allow however, is significantly reduced when solving. While the only the detection and partial reflection of ongoing IT-attacks, solution of tightly bound problems, in particular semantic and but they do not have the capacity to predict and prevent cognitive analysis of big data. Moreover, the attempts to attacks that are still in the preparation stage [1]. increase the cluster system performance by increasing the Such a situation requires the creation of fundamentally number of processing nodes have often not only failed to yield new information security systems which are capable of positive results, but, on the contrary, have had the opposite controlling the information space, generating and simulating effect due to a heightened proportion of non-productive scenarios for the development, prevention and deterrence of “overhead” in the total solution time which arises not from destructive information and technical impacts, and to initiate “useful” processing, but from organizing a parallel calculation proactive responses to minimize their negative impact. New process. 112 These fundamental disadvantages of modern cluster Nowadays, the problem of dispatching distributed supercomputers are a product of their “hard” architecture, computer networks is being solved with uniquely allocated which is implemented at the stage of computer construction server nodes. However, such centralized dispatching is and cannot be modified while being used [1-4]. effective when working with a small computational capacity or nearly homogenous computational resources. However, in Developed by Russian scientists, the concept of creating a cases of numerous, heterogeneous network resources, the reconfigurable supercomputer made it possible to configure operational distribution (also redistribution) of tasks, not to the architecture setup (adjustment) depending on the structure mention of informationally relevant subtasks via a single of the task’s solution without entailing the aforementioned central dispatcher becomes difficult to implement. Moreover, disadvantages. In this case, a set of field programmable logic using a centralized dispatcher significantly reduces the devices (FPLG) of a large integration degree comprises the reliability and fault tolerance of the GRID network, since a entire computing field and enables the user to create the task- failure on the part of the service server node that implements oriented computing structures similar to the graph algorithm the dispatcher functions will lead to disastrous consequences of the given task; this is used as a supercomputer for the entire network. computational device, rather than a standard microprocessor. This approach ensures a “granulated” parallel computing These disadvantages can be avoided by using the process as well as a high degree of time efficiency in principles of decentralized multi-agent resource management organization achieved by adjusting the computing architecture of the GRID network. In this case, software agents which are to the applied task. physically implemented in each computational resource as part of the GRID network play the main role in the dispatching As a result, near-peak performance of the computing process and represent their interests in the dispatching process. system is achieved and its linear growth is provided, when the Each agent will know the computing capabilities of “its own” hardware resources of the FPLG computational field are resource, as well as responsively track all changes (e.g. increased [5-8]. performance degradation owing to the failure of numerous Today, reconfigurable FPLG-based computing systems are computing nodes). increasingly finding use in solving a number of topical applied Given this information, the agent can “allocate” its tasks, primarily computationally labor-intensive and “tightly resource for solving tasks where “its” resource will prove most coupled” streaming tasks that require mass data processing effective. If the computing resource of one agent is not enough (streams), as well as tasks that require the processing of non- to solve the problem in the given time duration, then a standard data formats or variable number of bit (e.g. applied community of agents will be created, with each one providing fields of big data semantic and cognitive analysis, its resources for solving the various parts of a single task. cryptography, images processing and recognition, etc.). The benefits of a decentralized multi-agent dispatching This allows us to estimate the prospects of using system in a National Supercomputer GRID-network are reconfigurable supercomputers technology when establishing manifold: a National Cyberattack Early-Warning System [1]. • Ensure efficient loading of all computational resources III. OF NATIONAL SUPERCOMPUTER GRID included in the GRID network, by using up-to-date NETWORK information about their current status and task focus; At the same time, one supercomputer, even the most • Ensure the adaptation of the computational process to productive one, is not enough to create the computing all resource changes in the cloud environment; infrastructure of the National Cyberattack Early-Warning • Reduce the overhead costs for GRID network System. organization due to the absence of the need to include special service servers as a central dispatcher; Obviously, such a system should be built based on a • Increase the reliability and fault tolerance of the GRID network of supercomputer centers, with each unit having its network and, as a result, dependable computing, since owntask focus, while preserving the possibility to combine all the system will not have any elements whose failure the units into a single computing resource; this would, de may lead to disastrous consequences for the entire facto, provide a solution to computationally labor-intensive network [4, 7-8]. tasks of real-time and preemptive modeling development scenarios for prevention and deterrence of the destructive IV. DEVELOPMENT EARLY-WARNING CYBER- information and technical impacts. In other words, the SECURITY SYSTEM National Cyberattack Early Warning System should be based As a technological basis for solving this problem, it is on a certain segment (possibly secured from outside users) of proposed to consider modern software and hardware systems the National Supercomputer GRID network. for analyzing and processing information security events [10]. Furthermore, establishing a National Supercomputer In international practice, these complexes are developed as GRID-Network evokes a complex problem of optimal part of specialized security centers, known as the Computer distribution (dispatching) of computational resources while Emergency Response Team (CERT) or the Computer Security solving a stream of tasks on modeling development scenarios Incident Response Team (CSIRT), or the Security Operation for cyberattack prevention and deterrence [9]. Center (SOC). Computer Emergency Response Team (CERT) 113 or Computer Security Incident Response Team (CSIRT), or TABLE I. REQUIREMENTS FOR COGNITIVE SYSTEMS Security Operation Center (SOC). Requirements for such cognitive systems is represented The Russian Federation has already established a number While implementation of SHC average value change over a certain of state and corporate centers for detecting, preventing, and “Warning-2016”, there were a time interval; recovering from cyber-attacks or centers for responding to number of general requirements: - change in characteristics of events cyber security incidents, which are similar to foreign CERT / - monitoring a large number of occurrence frequency, etc .; CSIRT / SOC in their functionality. In domestic practice, they objects number real time  use of machine learning (1000000+); models to identify correlations and are known as SOPCA. Some examples include, inter alia, - low delay level in event detect incidents (i.e. the application of GOV-CERT.RU (FSS of Russia), SOPCA of the Ministry of processing (less than 10 ms); multivariate analysis, clustering and Defense of Russia, FinCERT (Bank of Russia), - distributed storage and fast access classification methods); Rostechnologies CERT, Gazprom SOC, etc. to data for petabyte data volumes;  identification of various - a high reliability degree of data kinds of templates in text messages The Russian Federation Presidential Decree No 31c of and knowledge storages able to described by regular expressions, and operate 24 hours a day, 7 days a applying the above-mentioned January 15, 2013 “On the establishment of a state system for week, without risk of interruption statistical functions to them; detecting, preventing, and recovering from cyber-attacks on or loss of information in the event  correlation of data from Russian information resources” establishes that the Russian of server failure (one or more); various sources; FSS is making methodological recommendations on the - ability to scale (including the  combination of parameters means of the underlying software) from various sources with subsequent organization of protection of the critical information for the performance and volume of application of the above-mentioned infrastructure of the Russian Federation and organizes work processed information without statistical methods; on the creation of a State and corporate segments of modifying the installed software by  testing of the models for Monitoring in the Detection, Prevention and Cyber Security upgrading / scaling the used set of detecting new incidents, etc .; hardware;  giving notification to users of Incident Response (SOPCA). - indicator of the SHC availability incidents detected by sending level should be at least 99% per The concept of a state system for detecting, preventing, year; messages to the visualization and and recovering from cyber-attacks on Russian information administration subsystem or other IS. - possibility of SHC integration resources No K 1274, was approved by the President of the with third-party systems: the Requirements of the data storage Russian Federation on December 12, 2014, defines the state complex architecture should be and knowledge subsystem: created, taking into account the SOPCA system image based on special centers for detecting, openness and ease of introducing - support for structured and preventing, and recovering from cyber-attacks, divided into unstructured data types; interaction modules with external - support for data index to speed up centers: systems. data search and retrieval; Creating the data storage of SHC - ability to work with time series;  Russian FSS (created to protect information resources “Warning-2016” is implemented - ability to create queries in the of the public authorities); taking into account the following MapReduce paradigm; requirements: - implementation of aggregation and  State and commercial organizations (created to protect  data, stored in the statistical queries on the time series in their own information resources). repository is a series of records the data storage location; characterized by a time stamp (time - ability to automatically remove In addition, these centers are coordinated by the National series), thus, the repository should outdated data of time series; Coordinating Center for Computer Crimes under the FSS of be optimized for storing time - availability of libraries for accessing series; Russia. storage functions for Java, Python;  high speed of data - library for accessing storage system At the same time, in practice, the task to develop a recording; functions through specialized drivers  high speed of Map (e.g. Django database engine); cognitive early warning system for cyber-attacks on the Reduce operation with preliminary - knowledge support for working with information resources of the Russian Federation was far from selection on time intervals; new models of neurophysics and being trivial.  ability to work with classical methods of artificial data with a coordinate as one of its intelligence. It was necessary to conduct appropriate scientific research properties (mobile sensors); The modeling, decision-making, and solve a series of complex scientific and technical  low requirements for visualization, and administration problems – e.g. input data classification, identifying primary data consistency (Eventually subsystem needed to support models Consistency); and methods of neurophysics, artificial and secondary signs of cyber-attack, early cyber-attacks  immutable data, intelligence, and mathematical logic, detection, multifactor prediction of cyber-attacks, modeling of without need to conduct distributed including cognitive agents and cyber-attack spread, training, new knowledge generation on transactions or synchronization. artificial neural networks of direct quantitative patterns of information confrontation – many of distribution [12], trained by the Requirements for data and Levenberg-Marquardt method, and so which did not have ready standard solutions [11]. knowledge collection, on. In addition, it was essential to ensure the collection, preliminary processing and The visualization and administration analysis subsystem: subsystem needed to support: processing, storage of big data, as well as carrying out  receiving data on - statistical reports on incidents and analytical calculations on extremely large amounts of various information interaction stored time series; structured and unstructured information from a variety of protocols, i. e. ZMQ (zeromq), TCP - density distribution function graphs; Internet / Intranet and IoT / IIoT sources (big data and big data / IP, RAW TCP / IP, HTTP (REST- - cybersecurity values distribution requests processing), AMQP, histograms; analytics) [17]. A possible list of requirements for such SMTP, etc; - series graphs with different cognitive systems is represented in Table 1.  receiving data in XML, characteristics (mean, extrapolation, JSON / BSON, PlainText formats; etc.); - correlation models for performing 114 Requirements for such cognitive systems is represented Possible system architecture of the cognitive early warning system for cyber-attacks on information resources of the  detection of incidents multifactor analysis; Russian Federation based on NBIC technologies is presented and security threats by applying the - parameters correlation and incidents in [1]. The positive experience gained in the creation of a following models to the incoming on selected time interval graphs; cognitive early warning system for cyber-attacks of SHC data stream: - classification models to detect correlation of parameters from various “Warning-2018” speaks to the expediency of a methodical - various parameters excess / decrease detection, setting sources with incident occurrence; approach to solving the task. thresholds for these parameters; - clustering models for detecting - detection of deviation from parameters correlation over a given normal values for various time interval, etc. Stage 1. Developing the technical component of a parameters; traditional SOPCA based on big data technologies is the - detection of statistical deviations creation of a high-performance corporate (state) segment of from standard behavior for various detecting, preventing, and recovering from cyber-attacks. parameters in the time window; Stage 2. Creation of the SOPCA analytical component based on “computational cognitivism” is the realization of the Appropriate technological solutions for creating a cognitive component of the cyber-attack early warning system cognitive early warning for cyber-attacks on Russia's capable of independently extracting and generating useful information resources are represented in [1]. knowledge from large volumes of structured and unstructured Here, the choice and implementation of the big data information for SOPCA operational support. processing component represented an important task. Another important task was the structure of big data storage structure. Many known solutions (e.g. Cassandra or In this case, the above-mentioned technical component of HBase), proved to be of little use due to the following SOPCA based on big data technologies should be limitations: appropriately allocated with the following functions:  Lack of database components to ensure efficient  Big data on the information security state in storage and retrieval by time series (most known solutions do controlled information resources collection; not contain integration tools due to their closeness, and those  Data detection and recovery after cyber-attacks on available (e.g. InfluxDB) do not have a high level of work information resources; stability);  Software and technical tools for IS events monitoring  Absence of the logical connections between the support; interfaces of business logic and the database;  Interaction with the state SOPCA centers;  System functionality duplication due to the  Information on the detection, prevention, and database and the processing logic being separated in a recovery from cyber-attacks, etc. (Fig. 1. Technical component of heterogeneous solution environment; SHC “Warning-2016”  Limited performance of the HBase solution, associated with the architectural solution features;  Significant overhead Cassandra, associated with the synchronization of data on various nodes, etc. TABLE 2. KNOWN SOLUTIONS FOR STREMING AND BATCH DATA PROCESSING Solution Developer Type Description Storm Twitter Packaged New solution for Big Data streaming analysis by Twitter S4 Yahoo! Packaged Distributed streaming processing platform by Yahoo! Hadoop Apache Packaged First open source paradigm MapReduce realization Spark UC Berkeley Packaged New analytic platform AMPLab supporting data sets in RAM: has high failure safety level. Disco Nokia Packaged MapReduce distributed environment HPCC LexisNexis Packaged HPC-cluster for Big Data 115 critical national infrastructure in the Russian Federation. The results obtained permitted the design of an early-warning cybersecurity system. In addition, prototypes were developed and tested for software and hardware complexes of stream pre-processing and processing as well as big data storage security, which surpass the well-known solutions based on Cassandra and HBase in terms of performance characteristics. As such, it became possible, for the first time ever, to synthesize scenarios of an early-warning cybersecurity system in cyberspace on extra-large volumes of structured and unstructured data from a variety of sources: Internet/Intranet and IoT/IIoT (Big Data and Big Data Analytics) [16]. TABLE III. EXAMPLE SHC COMPONENT “WARNING-2018” Cognitive subsystem of early warning system for cyber-attacks on Russian information resources 1. Development of traditional SOPCA components based on big data technologies 2. Creation of analytical SOPCA component based on “computational Fig. 1. Technical component of SHC “Warning-2016” cognitivism” 3. Monitoring the security and sustainability state of critically important infrastructure operation in cyberspace:  Introducing the informatization CWE passport (inventory, The analytical component based on “computational categorization, classification, definition of requirements, etc.); cognitivism” should be appropriately allocated with the  Creating priority action plans, etc.; following functions:  Certifying the information security tools (facility attestation for safety requirements [13-15]);  An early warning system for cyber-attacks on  Monitoring safety criteria and indicators and stability of the given information resources; facilities’ operation;  Maintaining a database of cybersecurity incidents.  Identification and generation of new useful knowledge 4. Identification of preliminary signs of cyber-attacks on Russian Federation about qualitative characteristics and quantitative patterns information resources: of information confrontation;  Recognizing structural, invariant, and correlation features of cyber- attacks;  Prediction of security incidents caused by known and  Adding primary signs of cyber-attacks to the database; previously unknown cyber-attacks;  Clarifying cyber-attack scenarios;  Preparation of scenarios for deterring a cyber-opposition  Developing adequate measures for deterrence and compensation. and planning a response, adequate computer aggression. 5. Identification of secondary signs of cyber-attacks on Russian Federation information resources: In the following sections on this issue, the practice of using  Identifying correlation links and dependencies between the signs;  Adding secondary signs of cyber-attacks to the database; big data technologies to organize a streaming process of  Clarifying cyber-attack scenarios; cybersecurity data, as well as practical questions of semantic  Developing adequate measures for deterrence and compensation. Master Data Management (MDM) will be considered for 6. From detection to prevention: building the SOPCA knowledge base.  Early warning for a cyber-attack on Russian information resources;  Prediction of cyber - attack from a cyber-enemy;  Assessing possible damage in case of cyber-attack; The development of a new functional model of a cognitive  Preparing scenarios of deterrence and coercion response to the high performance supercomputer will also be justified, cyberworld preparation. possible prototypes of software and hardware complexes for 7. Extraction of useful knowledge and generation of new knowledge in the the early detection and prevention of cyber-attacks will be field of information confrontation based on:  New NBIC models: presented, examples of solutions to classification and  Neuromorphic, similar to the living nervous system structure; regression problems will be given, as will be solutions to the  Corticomorphous, similar to the cerebral cortex structure; search for associative rules and clustering and possible  Genomorphic, similar to genetic and epigenetic mechanisms of living directions for the development of artificial cognitive organisms’ reproduction and development; cybersecurity systems (Table III).  Models and methods of mathematical logic and artificial intelligence;  Cognitive agents;  Artificial neural networks of direct distribution, trained according to the V. CONCLUSIONS Levenberg-Marquardt method;  Educable, hierarchically ordered neural networks and binary neural The article shares valuable insight gained during the networks; process of designing and constructing open segment  Various representations of dynamic thresholds and classifiers of prototypesof an early-warning cybersecurity system for network packets based on the Euclidean-Mahalanobis metric and the support 116 vectors method; [8] Vorobiev E.G., Petrenko S.A., Kovaleva I.V., Abrosimov I.K. Analysis  Statistical (correlation) and invariant profilers; of computer security incidents using fuzzy logic. In Proceedings of the  Complex poly-model representations, etc. 20th IEEE International Conference on Soft Computing and 8. Development of guidelines for work with cognitive SOPCA Measurements (24-26 May 2017, St. Petersburg, Russia). SCM 2017, 9. Cyber-training organization to develop skills of early warning for cyber- 2017, pp. 369 - 371. DOI: 10.1109/SCM.2017.7970587. attacks on information resources of the Russian Federation [9] Massel L., Voropay N., Senderov S., Massel A. Cyber Danger as One of 10. Development of the necessary normative documents the Strategic Threats to Russia's Energy Security. Voprosy 11. Training and retraining of employees on issues relating to the early kiberbezopasnosti [Cybersecurity issues]. 2016. No 4 (17), pp. 2-10. warning for cyber-attacks on information resources of the Russian Federation DOI: https://doi.org/10.21681/2311-3456-2016-4-2-10. 12. Elaboration of proposals for the development of a national (and [10] Dorofeev A.V., Markov A.S., Tsirlov V.L. Social Media in Identifying international) regulatory framework for cyber-attack early warning. Threats to Ensure Safe Life in a Modern City, Communications in Computer and Information Science, 2016, vol. 674, pp. 441-449. DOI: 10.1007/978-3-319-49700-6_44. [11] Sheremet I. A. Augmented Post Systems: The Mathematical Framework REFERENCES for Data and Knowledge Engineering in Network-centric Environment. Berlin, 2013. 395 p. [1] Petrenko S.A., Stupin D.D. Natsional'naya sistema rannego [12] Starodubtsev Yu.I., Grechishnikov E.V., Komolov D.V. Use of neural preduprezhdeniya o komp'yuternom napadenii [National system of networks to ensure stability of communication networks in conditions of advance computer attacks alerting]. Innopolis, Afina Publ., 2017. 440 p. external impacts. Telecommunications and Radio Engineering. 2011. V. (In Russ.). 70. N 14. P. 1263-1275. [2] Guarino, N. Services as Activities: Towards a Unified Definition for [13] Kozachok A., Bochkov M., Lai M.T., Kochetkov E. First Order Logic (Public) Services. In Proc. Enterprise Distributed Object Computing for Program Code Functional Requirements Description. Voprosy Workshop (EDOCW), 2017 IEEE 21st International. Quebec City, QC, kiberbezopasnosti [Cybersecurity issues]. 2017. N 3 (21), pp. 2-7. DOI: Canada, 10-13 Oct., 2017, pp. 102 - 105. DOI: 10.21681/2311-3456-2017-3-2-7. 10.1109/EDOCW.2017.25. [14] Reber, G., Malmquist, K., Shcherbakov, A. 2014. Mapping the [3] Nardi J., Falbo R., Almeida J., Guizzardi G., Pires L., Sinderen M., Application Security Terrain. Voprosy kiberbezopasnosti [Cybersecurity Guarino N. An Ontological Analysis of Value Propositions. In: issues]. 2014. N 1(2). P. 36-39. DOI: 10.21681/2311-3456-2014-2-36- Enterprise Distributed Object Computing Conference (EDOC), 2017 39. IEEE 21st International. Quebec City, QC, Canada, 10-13 Oct. 2017, pp. 184 - 193. DOI: 10.1109/EDOC.2017.32. [15] Barabanov A., Markov A., Tsirlov V. Procedure for Substantiated Development of Measures to Design Secure Software for Automated [4] Pashchenko I. N., Vasilyev V. I., Guzairov M. B. Smart Grid security Process Control Systems. In Proceedings of the 12th International system on the basis of intelligent technologies: rule base design. Siberian Conference on Control and Communications (Moscow, Russia, Izvestiya SFedU. Engineering Sciences [News of SFedU. Technical May 12-14, 2016). SIBCON 2016. IEEE, 7491660, 1-4. DOI: science], 2015, pp. 28–37. (In Russ.). 10.1109/SIBCON.2016.7491660. [5] Pospelov D.A. Introduction to applied semiotics. News of Artificial [16] Petrenko S.A., Makoveichuk K.A., Chetyrbok P.V., Petrenko A.S. Intelligence, 2002, no. 6. (In Russ.). About Readiness for Digital Economy. In Proceedings of the 2017 IEEE [6] Pospelov G.S. Artificial intelligence is the basis of the new information II International Conference on Control in Technical Systems, IEEE, technology. Moscow, Nauka, 1988. 280 p. (In Russ.). CTS, 2017, pp. 96–99. DOI: 10.1109/CTSYS.2017.8109498. [7] Vorobiev E.G., Petrenko S.A., Kovaleva I.V., Abrosimov I.K. [17] Petrenko A.S., Petrenko S.A., Makoveichuk K.A., Chetyrbok P.V. The Organization of the entrusted calculations in crucial objects of IIoT/IoT device control model based on narrow-band IoT (NB-IoT). In informatization under uncertainty. In Proceedings of the 20th IEEE Proceedings of the the 2018 IEEE Conference of Russian Young International Conference on Soft Computing and Measurements (24-26 Researchers in Electrical and Electronic Engineering (29 Jan.-1 Feb. May 2017, St. Petersburg, Russia). SCM 2017, 2017, pp. 299 - 300. 2018, Moscow and St. Petersburg, Russia) EIConRus, IEEE, 2018, pp. DOI: 10.1109/SCM.2017.7970566. 950-953. DOI: 10.1109/EIConRus.2018.8317246. 117