51 Formation of Reports for Model-Driven Data Consolidation System* Aleksei Korobko[0000−0002−6227−1362] Institute of Computational Modeling of the Siberian Branch of the Russian Academy of Sciences, 50/44 Akademgorodok, Krasnoyarsk, 660036, Russia agok@icm.krasn.ru Abstract. A platform for constructing model-driven systems was developed by the staff of ICM SB RAS to support the consolidation, storage and analytical processing of monitoring data for the state of natural and technogenic objects, medical data, scientific research data, etc. The basis for the creation of the platform was an original approach to the construction of model-driven systems, a meta-meta model and a set of algorithms. Algorithms are responsible for navigating between models, creating structures in the database, building a user interface, etc. One of the problems that have not been solved within the framework of the platform for building data consolidation systems is the automation of the creation of analytical reports. The existing solution makes it difficult for users to independently conduct analytical experiments, as it involves contacting the administrator or mastering the skills of working with SQL queries. The task of developing tools for the native formation of analytical queries to data is urgent. The article proposes an algorithm for generating queries and a constructor of research queries that implements this algorithm. Keywords: Web-system, Ad-hoc Data Consolidation, Model-driven Development, Dynamic User Interface, Metadata, Report. 1 Introduction The staff of ICM SB RAS have developed many systems focused on data collection. With the help of the system for consolidation of monitoring data of the regional center for monitoring and forecasting emergencies, it is possible to keep under observation the state of natural and technogenic objects, for example, the water level in the rivers and the number of accidents on the roads [1]. A consolidation system has been developed for scientific organizations which allows recording the results of scientific work: publication activity of employees, patent activity, etc. [2, 3]. Much work has been done to facilitate the collection and storage of research results [4]. Systems have been developed for collecting medical questionnaire information, environmental * Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 52 monitoring data of the Krasnoyarsk reservoir [5] and data for assessing the ecological state of soils in the Krasnoyarsk Region. The main feature of the constructed data consolidation systems is the use of the original modification of the model-driven approach in their development (MDD – model-driven development) [6, 7]. MDD uses models as a central component of the software development process. All the models are divided into levels of abstraction, the transition from the models of one level to the models of another level is regulated. The result of the development is the program code of the system. The modification consists in using the tools of the data consolidation system to edit the control model and instantly change the behavior of the system in response to the change in the model [8]. The author's implementation of the model-driven approach allows the dynamic evolution of the created data consolidation systems by the users themselves. Experience (more than five years) in creating model-driven systems for consolidating and storing data, as well as systematizing the accumulated knowledge, made it possible to develop a universal meta-metamodel of the data collection process. The meta-metamodel describes the main classes of entities and relationships involved in the data collection process. They are used to build control models for specific application systems. Also, a set of algorithms for automatic generation of the applied models and a system interface based on the control model was created and tested. Together with the original approach to the construction of model-driven systems, the meta-metamodel and a set of algorithms made it possible to create a platform for building data consolidation systems [9]. The use of a software platform for building data consolidation systems ensures the systematization of the consolidated data and their consistent storage. In addition, the control model formed during the construction and development of the system is, in fact, a description of the subject area and can be reused. One of the problems that have not been solved within the framework of the platform for building data consolidation systems is the automation of the creation of analytical reports. The existing solution is not optimal, the user defines the composition of the report and, together with the administrator, builds a view in the database. The metadata of the created view is saved using the tools built into the database and then read by the analytical module. The user is given an opportunity to view the data included in the report in the form of tables, graphs, and cartograms. The existing solution makes it difficult for users to independently manage analytical experiments, since it is associated with an appeal to the administrator or skills in working with SQL queries. Using the control model allows us to create a tool that greatly facilitates the formation of analytical reports. The analytical reporting tool, as well as the entire model-driven system, should be based on specific models. The CWM specification [10] (Common Warehouse Metamodel) is the most widely used in the field of analytical data processing. The analytical model of the CWM specification is based on the fact-dimension model proposed by Mateo Golfarelli (et al.) in 1998 [11]. According to the proposed model, data are divided into aspects of analysis - dimensions, and aggregated numerical characteristics - measurements. In turn, the measurements are combined into facts, called cubes in the CWM terminology. An 53 algorithm for constructing an analytical model (dimensions, cubes, and connections between them) based on the data of the control model was developed and presented in [4]. An urgent task is to create a visual designer of exploration queries to the accumulated data using the resulting model. The following describes the algorithm for forming a user request to the database based on the analytical model and shows the interface for its formation. 2 Exploration Query Builder 2.1 Algorithm for Generating Exploration Queries The existing analytical module of the platform for building data consolidation systems has been tested on several large tasks, about a hundred reports have been created in it. It is very costly to redo the existing reports, the same concerns the implemented primary analysis tools. Therefore, it is necessary to give users a tool which allows them to create reports in an intuitive manner in the format accepted in the system. But, besides this, it is necessary to save the description of the report (analytical model) in the form of metadata. The analytical model will allow us to edit and supplement reports in the future. User’s behavior when creating a report without using the constructor is described above, and it consists of several simple steps. First, the user defines the classes and attributes involved in building a report and the aggregation function, and then the administrator writes a request. In the exploration query builder, we need to facilitate the process of selecting classes and attributes, as well as to automate the construction of a view in the database. In terms of the CWM specification, the exploration request generation algorithm is described in Fig. 1. According to this algorithm, at the first stage, the system generates a connected graph of cubes and displays it to the user. Showing only the cubes to the user can significantly reduce the amount of the displayed information. In the process of selecting the cubes, unrelated branches are hidden, and this eliminates the possibility of building an incorrect query. 54 Fig. 1. Algorithm for generating a report by the user. In the next step, the user sees the measures and dimension attributes of the selected cubes. The user, using the graphical interface, selects the aspects of analysis which interest him in the graph and defines the aggregation functions for the selected measures. Further, according to the choice, an expanded table with data is built. The user can assess the completeness of the sample and its compliance with the research objectives. The result of the work is a customized analytical model. The system saves the model to the database and automatically builds a view based on it. The view is made in accordance with the metadata accepted for the analytical module, and this allows one to see the resulting report in this module immediately after the creation or modification. 2.2 Exploration Query Builder Interface The created exploration query constructor works according to the above algorithm. Let us consider its work using the example of the Soil Condition Research System. Initially, the system included the results of field studies, namely the assessment of the ecological state of the soil of the settlements of the Krasnoyarsk Region. At each of the objects of research full-profile soil sections were made, binding and description of the soils based on the morphological features were performed, the full name of the soil was given, and the soil samples were selected at different depths. To store this information, the following classes were created: "Sampling Place", "Soil section", "Soil type", "Section horizon", etc. The next step was to study the chemical composition of the natural samples, content of mobile nitrogen and phosphorus, and humus state. To store the information, classes of the same name were created. 55 Fig. 2. Graph of the cubes. Later, to study the dependence of the results of biotesting on the storage duration of the soil samples and conditions of the sample preparation, carried out by a group of students, a separate branch was created. The branch contained the classes “Biotesting of natural samples”, “Biotesting of reference samples”, etc. Unfortunately, the unsuccessful numbering of the natural samples did not allow them to be unambiguously compared with the data already available. The comparison is still underway. To continue the study of natural samples, sampling and analysis of samples of technogenic surface formations near the industrial enterprises of the city of Krasnoyarsk were carried out. The samples were analyzed for the concentration of lead, arsenic and fluorine. Agrochemical indicators were studied. The control model of the system was supplemented with the appropriate classes. Last year, within the framework of a grant the Krasnoyarsk reference center conducted an experiment with the reference soils to calibrate and determine the boundaries of the effectiveness of bioluminescent enzymatic analysis as a biotesting method to assess the degree of soil degradation, as well as anthropogenic and technogenic pressure on soils. The studies were carried out using various enzyme systems using standard methods of reference samples for common pollutants. The following classes were added to the system: “Analysis for pesticides”, “Analysis for copper”, “Physicochemical properties of the sample”, etc. Thus, at present, the Soil Condition Research System contains data from five different experiments. An analytical model containing the information about all the cubes created during the development of the system can be seen in Fig. 2. 56 Fig. 3. Graph with the measures and dimensions. At the first stage of constructing a research query, the user can select any cube from the proposed ones, but after selecting the first cube, the unrelated branch will disappear from the graph. For example, if you select the cube “Sampling location”, only the “Field studies” branch associated with this cube is available (Fig. 3). In Figure 3, we can also see that the user has selected the cubes: ‘Humus state” and “Soil type”. The cubes selected by the user display the related elements; these are the dimension attributes and measures. With the help of simple mouse manipulations, the user can select attributes and indicators of interest for him. The cube – “Sampling location” has two attributes: “Settlement” and “Description”, the “Settlement” attribute is selected. “Soil Type” has four attributes. Have been selected: “Department” and “Name”. In the “Humus state” cube, four indicators. Have been selected: “Humus, %”, “Maximum depth, cm”, “Minimum depth, cm”, “Residual luminescence (T), %” and “Soil sample”. After selecting the interesting aspects of the analysis, the user can view a detailed description of the selected attributes and indicators, start generating a database query based on them and examine the data obtained in detail (Fig. 4). In Figure 4, we can see that almost all the indicators are filled, but some of the data is not indicated. For example, the section of the soil type is empty, the locality is filled with service information. If necessary, one can add or edit these data in the data entry module. If the user is not satisfied with the composition of the report, he can return to the previous stage of selecting attributes and measures. If the data is selected correctly, the user proceeds to the selection of the aggregation function for the indicators, in our case it is “Humus,%”, “Maximum depth, cm”, “Minimum depth, cm”, “Residual luminescence (T), %” and “Soil sample”. After selecting the aggregation functions, the user starts the save process. As a result, the analytical model is saved in the 57 database and a report is created. The report is available for viewing in the analysis module. Fig. 4. Table with the data. The created custom exploration query builder works according to the proposed algorithm. It allows users to quickly conduct analytical experiments by creating new or editing old analytical models. 3 Conclusion The platform for building data consolidation systems is used in many subject areas to support the collection, storage, and analytical processing of data. The proposed algorithm for the formation of exploration queries and the query designer built on its basis will facilitate the process of data analysis by the users and attract new researchers to use the platform. Acknowledgments. The study was carried out with the financial support of RFBR and the Government of Krasnoyarsk region, research project №18-47-240005. References 1. Korobko, A., Nicheporchuk, V., Nozhenkov, A.: Dynamic Generating User Interface of the Data Consolidation Web-system for Emergency Monitoring. Informatization and communication 3, 59–64 (2014) (in Russian) 2. Korobko, A., Korobko, A.: Applied model of the scientific activity accounting system. In: Proceedings of the International Scientific and Practical Conference on the Actual 58 Problems of Mathematical Modeling and Information Technologies, pp. 71–75, Sochi (2015) (in Russian) 3. Korobko, A., Korobko, A., Karepova, E. Certificate of state registration of a computer program No. 2018614158 dated 02.22.2018. Copyright holder: FRC KSC SB RAS 4. Korobko, A., Korobko, A.: Constructing the analytical model for specialized model-driven system of scientific data consolidation. In: CEUR Workshop Proceedings, vol. 2534, pp. 377–383 (2019) 5. Korobko A., Korobko, A.: Information Modeling of Temporal Spatial Data for Ecological Monitoring of the Krasnoyarsk Reservoir. In: CEUR Workshop Proceedings, vol. 2033, pp. 319–323 (2017) 6. Object Management Group (OMG): Model Driven Architecture (MDA). MDA Guide Revision 2.0 (ormsc/2014-06-01), June (2014) 7. Seidewitz, E.: What models mean. IEEE Softw. 20(5), 26–32 (2003) 8. Korobko, A., Korobko, A.: An original approach to the construction of a model-driven data consolidation system. Informatization and Communication 4, 232–238 (2017) (in Russian) 9. Korobko, A., Korobko, A.: A software platform for constructing model-driven systems for primary data consolidation. In: Proveedings of the VII International Conference “Knowledge-Ontology-Theories” (ZONT-2019), pp. 203-212, Novosibirsk (2019) (in Russian) 10. Peyton, L.: Common Warehouse Metamodel. In: LIU L., ÖZSU M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA (2009). https://doi.org/10.1007/978-0-387- 39940-9_900 11. Golfarelli M., Maio, D., Rizzi, S.: The Dimensional Fact Model: A Conceptual Model for Data Warehouses. Int. J. Cooperative Inf. Syst. 7, 215–247 (1998)