Improving the Efficiency of Typical Scenarios of Analytical Activities Oleksandr V. Koval 1, Valeriy O. Kuzminykh 1, Iryna I. Husyeva 1, Xu Beibei 2 and Zhu Shiwei2 1 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, 03056, Ukraine 2 Information Institute, Qilu University of Technology (Shandong Academic of Sciences), Jinan, 250316, China Abstract The article considers methods for the efficiency evaluation of modeling scenarios of analytical activities based on ontology using graphs. The algorithm of efficiency evaluation of modeling scenario by analytical actions, and also algorithm of the consecutive check of the obtained scenario as to its optimality is suggested. The results of simulation experiments of modeling the scenarios with an application of the suggested methods and the developed software on a set of two hundred test examples are described. Analysis of test results shows a significant reduction in costs when using this approach based on typical scenarios. At the same time, it is possible to significantly simplify analytical activities and increase efficiency when using typical scenarios. The developed approach can be used in the development of various information and analytical systems. Keywords 1 analytical activity, scenario of analytical activity, ontology of subject domain 1. Introduction Today, the efficiency of management decisions in almost all areas of activity, including the process of designing complex technical systems by scientists and engineers, directly depends on the availability and quality of information and knowledge analyzed. But every year the amount of data, information, and knowledge needed to make balanced decisions increases exponentially, and the process of analyzing this amount of data is increasingly limited in time, which forces software developers to support information analysis and constantly improve methods and technologies for collecting, structuring and analyzing various data. All this encourages a further search for ways to improve both scientific and technological approaches that contribute to the rational organization of analytical activities and improve the quality of information analyzed and the efficiency of software systems in general. The modern development of information technologies, including those that solve the problems of analytical activities, is characterized by a number of trends, which primarily shift from customization, which has long been the standard, towards personalization. Such trends include the accumulation and use of knowledge about the subject domain and the accumulated demand for solving problems in software systems, intellectualization of both software systems and the decision-making processes [1], structural algorithmization, and as a consequence service-oriented architecture in developed software systems. These trends require further scientific research and processing, forming of modern architectural solutions, based on up-to-date approaches to design and implementation of software systems for supporting analytical activities. XXI International Scientific and Practical Conference "Information Technologies and Security" (ITS-2021), December 9, 2021, Kyiv, Ukraine EMAIL: avkovalgm@gmail.com (O.V. Koval); vakuz0202@gmail.com (V.O. Kuzminykh); iguseva@yahoo.com (I.I. Husyeva); xubeibei1987@163.com (X. Beibei); zhusw@sdas.org (Z. Shiwei) ORCID: 0000-0003-0991-6405 (O.V. Koval); 0000-0002-8258-0816 (V.O. Kuzminykh); 0000-0002-8762-3918 (I.I. Husyeva); 0000-0003- 1430-5334 (X. Beibei); 0000-0002-6651-8449 (Z. Shiwei); © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 123 The steady trend towards the intellectualization of software components that implement a range of analytical processes is due to the presence of both objective and subjective factors. Objective factors include the growing complexity of the data analysis processes and, as a result, the processing scenarios. Depending on the goals and objectives of the analysis, as well as the nature of data (structured, unstructured, poorly structured, web data, or social media data), various processing methods can be used for analysis, such as cleaning, normalization, format conversion, cluster analysis, training, modeling, forecasting, graph analysis, quality assessment, etc. The variety of processing methods and technologies requires from the analyst both the skills for their application and specialized knowledge of the intended usage. In addition, for the analysis of different data types, it is desirable to use those scenarios of analytical processing which correspond to both the type of data and the goals and objectives of the analysis. Subjective factors are associated with human actions (expert analyst) - the steps of the analysis of a complex subject domain, and with the ability to perceive and interpret the results of analytical research. Of course, both objective and subjective factors affect the efficiency of analytical methods and the results of analytical activities in general. Therefore, for a certain range of analytical software systems, it is very important to facilitate the processes of modeling and execution of complex scenarios through their intellectualization. One of the manifestations of intellectualization is the accumulation of knowledge about the peculiarities of the functioning of analytical software systems, including knowledge of the actions of the analyst in performing typical tasks. In this context, the meaning of knowledge accumulation is that the software can independently classify new data obtained during the analysis, and offer the user the most appropriate steps of the scenario, based on accumulated knowledge and the state of previous steps [2, 3]. That is, a modern analytical tool should be able to predict the most likely next step of a complex scenario and offer it to the user, based on the existing knowledge in the system and the conditions prevailing in the process of scenario execution. Today, analytical activity is a set of actions based on concepts, methods, tools, normative and methodological materials for data collection, accumulation, processing, and analysis to justify and make decisions or generate new knowledge. The object of analytical activity is often complex interconnected processes, each of which can be characterized by a significant number of parameters, the continuity of change, and the difficulty of predicting the dynamics of their development. A feature of the problems that arise in solving such problems is the lack of a clear algorithm that would always allow achieving the desired solution. 2. Scenarios of analytical activity The scenario of analytical activity, based on the paradigm of software intellectualization, can be considered as a certain structure of knowledge representation, which is used to describe the sequence of related events in the form of a graph. In this approach, the digraph determines a set of ways to achieve the goal in a particular stereotypical situation (pattern), presented in the form of a semantic network, such as ontology [4,5]. That is, to generate a scenario of analytical activity with elements of intellectualization, a directed acyclic graph of possible scenario operations (Directed Acyclic Graph - DAG) must be formed. It should be recalled that the DAG reflects the assumption of the relations between variables - nodes in the context of the constructed graph, in which there are no oriented loops [6]. With this approach, the task of building a scenario of analytical activity focuses on solving the well- known problem of finding the shortest path (Single Source Shortest Path) [7]. Thus, the search for the shortest path in the graph from the initial to the ending point, taking into account the accumulated knowledge of stereotyped user actions can be considered as a basis for intellectualizing the process of forming a scenario of analytical activity. Given the trends in software intellectualization, this problem statement is still very relevant, not only for the intellectualization of the process of analytical activities scenarios forming but also for solving a wide range of tasks, including the interaction of analysts within organizational management systems. Modern analytical activities should be based on the widespread use of information technology. In the context of analytical activities, technology can be considered as a combination of five types of 124 engineering - computer, software, systems, data, and knowledge engineering. Engineering is an application of scientific approaches to the design or development of structures, devices, processes, and works [8]. For analytical software systems, the central object is the knowledge base, which is formed following the typical tasks of the subject domain and takes into account its features, which facilitates the adoption of grounded decisions. Modern methods of knowledge engineering (obtaining knowledge from an expert, data mining, machine learning, etc.) to build a knowledge base are built on knowledge engineering using ontology - a detailed formalization of subject domain knowledge using a conceptual scheme. This conceptual strategy usually consists of an ordered data structure with classes, relations, theorems, and constraints accepted in a specific domain. Scenario modeling information technologies include: defining the goals of scenario construction, the set of acceptable scenario states, forming the structure of input and output data arrays, methods of iterative subject domain modeling process, the ontology of a particular subject area, and original software, generating analytical activity scenarios. Recently, analytical activities within information and analytical systems, as a rule, mean a set of actions of the user (analyst) in the software system for data collection, accumulation, processing, and analysis to justify and prepare decisions or generate new knowledge. In the tasks of analytical activities, the scenario is a set of methods for describing and organizing the steps of analytical activities under given constraints. In this case, the ontology of the subject area, as a basis for building a model in the form of a graph, describes the structure of the subject domain and provides definitions for a set of concepts and possible relations between them [9,10]. This makes it possible to develop different scenarios for achieving the objectives of analytical research, taking into account the variety of factors and limitations. The scenario of analytical activity is a sequence of functional tasks, initial, final and intermediate events aimed at achieving the goals of analytical research. The scenario description consists of a branched directed sequence of elements connected by links that form a structure as a directed graph. Each element of this graph is either an action performed by the script executor or an event that affects the subsequent execution of the scenario. The event can be caused both by changes in the external environment and by changes in the state of the object of study. Each action can be described as a function that returns a result, or as a procedure that can implement any sequence of operations with a given level of detail. The construction of a typical scenario is often based on a visual representation of the data processing model and a fixed set of requirements. But most of the requirements at the initial stage are vague, without formalizing and taking into account the semantic content of the processes, which leads to the emergence of logical errors in the execution of scenarios and the need to return to the starting point of the modeling process. It complicates the modeling process greatly and increases its duration. Formally, the analytical process is represented by an oriented graph, which reflects the sequence of possible user actions to achieve the goal of analytical research [11]. There are currently many methods for selecting performance-appropriate scenarios from a variety of valid scenarios based on ontology, and for analyzing the nature and composition of factors that may affect the planning and execution process. In practice, the planning and implementation of the scenario determine the features of the ways that differentiate a scenario by factors determined by the values of the criteria based on which decisions are made and procedures for forming typical scenarios are determined. One of the most complex and relevant tasks today, which requires the construction of efficient scenarios for its solution, is the task of building and further optimizing typical scenarios for analytical activities based on branched information. The task of evaluating the efficiency of streaming information collection scenarios for analysis based on branched information is one of the most important tasks of big data processing [12]. 3. Efficiency evaluation of the graph‐based scenario modeling The structural approach to ontology consideration is the most appropriate one, as the presentation of ontology structure in the form of a multilevel hierarchical graph makes it possible to measure its 125 properties using metrics, which can be used to determine its quality and make recommendations for its improvement. An important aspect of the structural approach is the need to assess the efficiency of the ontological model based on the assessment of the reduction of search costs on the graph describing the ontology, compared with the unfocused search for a path. Such graphs typically use cost estimates for both the processing of information-related actions at individual nodes in the graph and the cost estimates for the actions themselves (along the edges of the graph) when moving from one node to another. Thus, it is possible to estimate the total cost of each possible path according to the scenario or a separate fragment of this path to compare and determine the most efficient paths included in the description of the selected scenario [13]. The sequence of steps of the algorithm for evaluating the efficiency of scenario modeling by analytical activities is as follows. Step 1. Select on the bottom layer of graph vertices the appropriate assessment of the efficiency of finding the optimal path on the graph according to the relevant criteria. Step 2. Analyze the relations of the selected vertices with the vertices of the next level, which is located higher in the hierarchy of the model. Step 3. Save the graph edges that correspond to the found links for further analysis and use in search operations. Step 4. Repeat iterations to the top-level search source. Step 5. Consolidation of all collected edges and vertices of the graph, which is built based on the appropriate assessment of efficiency, i.e., the query for information. Step 6. Arrange all paths on a constant graph according to the evaluation function. Step 7. Analysis of the graphical model of the scenario, which is based on the choice of the shortest path with the minimum number of edges on the graph (Figure 1). Estimation of modeling process efficiency of the scenario of analytical activity as the efficiency of search of an optimum way on a graph can be defined as the aggregate estimation of actions costs on each of nodes and an estimation of costs of transition to the next node from the initial node to final: 𝑃 𝑡 𝑟 , (1) where tk is the time spent on each of k nodes, corresponding to the selected path on the graph, rk is the resources spent on each of k nodes, corresponding to the selected path on the graph; k ∈ K is the node numbers of the selected path on the graph. Accordingly, the evaluation criterion is defined as the sum of the cost of processing on the edges of the transition between nodes: 𝑃 𝑇 𝑅 , (2) , where Tij is the time spent on each ij arc of a graph, corresponding to the selected path on the graph; Rij is the resources spent on each ij arc of a graph, corresponding to the selected path on the graph; i,j ∈ K is the node number of the selected path on the graph. Then the general criterion can be defined as the sum of Pa and Pd. Total criterion 𝑃 𝑃𝑎 𝑃𝑑 (3) or 𝑃 𝑡 𝑟 𝑇 𝑅 , (4) , where k,i,j ∈ K, K is a set of vertices which correspond to the selected path. 126 a) Possible paths b) Selected optimal path Figure 1: An example of a model for constructing the optimal path from the starting a0 to the final a8 node in possible ways (arcs) dij Thus, both criteria of efficiency and an estimation of the received results in a certain node which is defined according to a certain task by the presence of necessary results according to estimations of analysts-experts: 𝑃 𝑐𝑡 𝑐 𝑟 𝑐 𝑇 𝑐 𝑅 , (5) , where k,i,j∈ K, ct is the impact coefficient of time spent on each of k nodes; cr is the impact coefficient of resources spent on each of k nodes; cT is the impact coefficient of time spent on each of ij arcs of the graph; cR is the impact coefficient of resources spent on each of ij arcs of the graph. Besides, ct, cr, cT, cR ≥ 0. Depending on the values of these coefficients, the analyst determines the type of task of finding the optimal path: by the criterion of time, by the minimum cost of resources, or by using a combined criterion. 127 The choice of the total direction of the path can be determined by the presence of the most efficient direction in terms of the value of criterion V at each successive step of choosing the direction on the graph. The trigger for evaluating the results at each step determines whether there is an improvement in the results or not. Qi> 0 - node is considered feasible for comparison and selection. Qi = 0 - node ai is not considered feasible for comparison and selection. This assessment should be performed by the analyst based on the obtained results. For each level, those directions (edges) and nodes corresponding to Qi> 0 are considered in the comparison. Other directions (edges) and nodes are removed from further consideration. The value of Qi> 0 can be determined by certain selected measures by the analyst as an estimate of the increase in the results of solving the analytical problem. In case of the absence of existing and reasonable methods for estimating Qi = 1 - node ai is considered as one that improves the decision of the analyst. If Qi = 0 then node ai is not considered possible for comparison and selection, as one that does not provide new knowledge or new results for the analyst. Without reducing the degree of generality, the model, which considers the costs associated with both the overall cost estimate for each node and the cost estimate for the transition to the next node on the path from the initial node to the final, can be considered as a model taking into account only the costs on the edges of the graph. In this case, the costs in the nodes will be included in the costs of the edges of the graph: 𝑃 𝑡 𝑟 𝑇 𝑅 , (6) where i,j ∈ K. Based on the general description of the ontology model graph characteristics, the problem of finding the shortest path on the graph (Single Source Shortest Path - SSSP) [14,15] can be formulated as follows. It is necessary to specify the path from the initial state aS to the final state aF for the graph in Figure 2, which has the lowest possible total weight: P * (aS, aF), f * (aS) = f (P * (aS, aF) = min f (P (aS, aF)). For each level of the graph j = 1…m A = {aij} for i = 1…n is a complete set of actions that have similar in quality, but different in performance results. If dijkl is an arc between aij and akl, and dijil is an arc between aij and ail, then in most cases, we can assume that dijil << dijkl for i, k = 1…n and j, l = 1…m for i ≠ k. A typical scenario defines a set of arcs dTii + 1. for i = 1…n-1 A scenario can be defined as improved if the total estimate of the scenario that determines the new scenario for the new diji + 1l chain is less than the estimates of the sum of the arcs of the typical dTii + 1 scenario. 128 Figure 2: Graph structure for script building The algorithm for performing a sequential check of the obtained scenario for its optimality can be composed of the following steps. 1. For each level of the scenario i = 1…n-1 and j, l = 1…m the value of diji + 1l is compared with dTii+1. 2. If the values are diji + 1l