=Paper=
{{Paper
|id=Vol-2534/26_short_paper
|storemode=property
|title=UNISAT. The Technology for Development of United Systems of Maintaining Extra Large Distributed Archives of Heterogeneous Satellite Data
|pdfUrl=https://ceur-ws.org/Vol-2534/26_short_paper.pdf
|volume=Vol-2534
|authors=Andrey A. Proshin,Evgeny A. Loupian,Alexandr V. Kashnitskii,Ivan V. Balashov
}}
==UNISAT. The Technology for Development of United Systems of Maintaining Extra Large Distributed Archives of Heterogeneous Satellite Data==
UNISAT. The Technology for Development of United Systems of Maintaining Extra Large Distributed Archives of Heterogeneous Satellite Data Andrey A. Proshin, Evgeny A. Loupian, Alexandr V. Kashnitskii, Ivan V. Balashov Space Research Institute RAS, Moscow Abstract. Rapid development of satellite Earth remote sensing led to a significant increase in requirements for systems for maintaining satellite data archives. The article describes the UNISAT technology, designed to build systems for maintaining extra large distributed archives of heterogeneous satellite data, providing dynamic generation of data at the user's request and a wide range of tools for remote analysis and data processing. Keywords: UNISAT Technology 1 Introduction Rapid development of satellite remote sensing systems in recent decades has led to an explosive increase in the volume of satellite data obtained from a lot of various observation instruments [1]. The field of application of earth remote sensing data is expanding, as the data are now widely used for a variety of research and applied tasks in the natural environment and anthropogenic objects is expanding area. In turn, all this leads to a significant increase in the requirements for systems that provide satellite data processing, and in particular, to systems for maintaining satellite data archives which provide the back end for the data analysis. One of the main requirements for modern satellite data archiving systems is to support the operation of heterogeneous satellite data obtained by observation devices with different technical characteristics (observation frequency, spatial resolution, repeatability of observations, etc.). This leads to the need for unification of data archiving procedures, development of a common database structure, and implementation of common software interfaces to access the variety of satellite data types. Another important requirement is the support of extra large distributed satellite data archives, which enables the joint operation of the archives located in a number of satellite data acquisition and storage centers making a single information resource. Thus, the users access the data, no matter where they are physically located. The requirements for data access services have also changed dramatically in recent years. Previously, the users of satellite data were mostly satisfied with obtaining raw data for use in their processing and analysis systems, but now they are increasingly interested in the possibilities of accessing the ready-made data products at a number of processing levels [2], moreover, the number of such data products required to solve specific problems is constantly growing. It should be noted that, since satellite data tend to take a lot of disk space, the storage of all possible data products derived from the same source data becomes impractical and, in many cases, technically impossible. The reasonable way out of this situation is to provide users with access to "virtual" data products, i.e. products that are dynamically built from source data in real time. The key advantage of this approach is the ability to expand the list of data products available to users without the need for mass processing of the data archives. In our opinion, one of the most urgent development directions of modern satellite data access systems is the implementation of various tools for satellite data processing and analysis, which were previously available only in specialized desktop applications. The data analysis tools implemented through these interfaces enable the processing of large amounts of available satellite information, using the capacity of the data centres. The most prominent representatives of such systems, in our opinion, are Google Earth Engine [3] (https://earthengine. google.org) and the "Vega-Science" system [4] implemented in framework of the "IKI-Monitoring" center for collective use. [5]. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This article gives a brief description of the key features of the UNISAT technology developed in IKI RAS [6], designed to create unified systems for maintaining extra large distributed archives of heterogeneous satellite data. In turn, it is based on the use of technologies and software that have been developed in IKI RAS in recent decades to build a wide range of various archives of satellite data [7,8]. UNISAT technology implements the common approach to processing a variety of satellite data types, which differ both in spatial resolution and the scheme of data storage. An important advantage of the developed technology is the support of the "virtual data products" mechanism, i.e. products that are dynamically generated upon the user's request based on the processing of satellite data available in the archives. The development of this technology was also focused on supporting a wide range of various tools for remote analysis and processing of satellite data. 2 General Architecture Years of experience in creation and operation of various systems for satellite data access enabled us to develop a common architecture for building systems for maintaining satellite data archives. Its key elements make up the new unified structure of the database for storing heterogeneous satellite data, as well as the structure of the reference database containing detailed information about satellites, observation instruments, implemented products, as well as the schema for building "virtual" data products. The General architecture of the node of the distributed archive management system UNISAT is shown in Fig.1. Data can be imported to the archives both the satellite data processing subsystem and from external suppliers of satellite data. The left part of the diagram represents the software components responsible for satellite data archiving and data exchange with other information centers. The middle part of the diagram contains the reference database unisat_catalog, the unisat database, containing metadata available in the archive data, and the connected file storage, which physically contains the satellite data files. The right part of the diagram shows the main services designed to provide access to the data in the archive. The cartographic WEB interface is the main tool for providing users with access to both the satellite data itself and the services designed to analyze them. In the diagram, the dotted arrows show the requests for data or metadata, solid arrows stand for the metadata flow, and hollow arrows mean the data flow. External components that are not being a part of the archiving system are shown in dotted shapes. Figure. 1. The General architecture of the node of the distributed archive. Fig. 2 represents the schematic design diagram of the satellite data distributed archive implemented with use of the presented technology. The diagram shows the main data and metadata flows implemented within the distributed archive. Data can be fed into archives from both external data centers and local reception stations. For each information centre, a policy for exporting data or metadata to the remaining centers of the distributed archive can be defined individually, but in the simplest version, each centre contains all the information about the data available in the distributed archive. Each center can implemented its own subset of satellite data archives, but the reference database unisat_catalog always remains synchronized with the central server. The main advantage of the presented implementation of the distributed satellite data archive is a high degree of flexibility in determining what types of metadata and data should be transferred between the data centers in framework of the distributed archive, which is practically impossible with the use of standard database replication tools. Figure. 2. Schematic design diagram of the distributed data archives. 3 The Unified UNISAT Database Structure for Satellite Data Storage To define the unified database structure, let's introduce a number of terms. A session is a set of data uniquely identified by the fields: date and time (dt), satellite (satellite), station (station), instrument (device). The fragment is the spatial part of the session. The session can consist of a single fragment or a set of fragments. The UNISAT database has two main tables for storing metadata: the table fragments intended to describe the fragments, and the table fragment_products for definition of the data products specific to the respective fragments (see Fig. 3). In the figure, the primary keys of the tables are shown in bold, and the fields of the unique key are shown in italics. In case a session consists of one fragment, one record is added into the fragments table indicating the type of fragment “single_fragment”. And in case the session consists of many fragments, each of them corresponds to a separate record with the fragment type “fragment” and a unique fragment number, but it also requires that fragments of the type “products_contour” are automatically generated to define the outer contours of products or product groups in framework of the session. The scale_level field in the fragment_products table is used to identify coarse versions of data products used to accelerate the display of images at scales exceeding the resolutions of the satellite data itself. It is significant that the same products can have different spatial partitioning at different scales. Figure. 3. The structure of the main UNISAT database tables. The developed structure made it possible to process efficiently the data both in form of separate fragments and in form of integrated products consisting of many thousands of separate fragments. The advantages of the described structure also include flexible support for additional spatial partitions at coarse scales, which allows to accelerate the formation of the required images at a given scale. 4 The Structure of the unisat_catalog Common Reference Database The database is designed to maintain all the necessary reference information about receiving stations, satellites, characteristics of satellite instruments, etc., and also contains information about the types of products stored in the archives and the schema for obtaining "virtual" information products on their basis. Table 1 below represents the composition and purpose of the main tables of the unisat_catalog database, indicating the type of reference information stored in them. The key difference of the proposed structure of the "reference" database is the integration of common information on satellites, instruments and their respective data types with the information necessary to obtain "virtual" data products. At the same time, the implemented services for obtaining extended metadata allow to provide information necessary for the implementation of tools for remote analysis and processing of satellite data. Table 1. The composition and purpose of the unisat_catalog database tables Information type Table Purpose General reference satellite information on satellites information satellite_device information on instruments installed on satellites device information on satellite instruments band information on satellite instrument channels station information on satellite data receiving stations center information on data centres Description of the types product description of types of data products of products stored in the product_cases information on the types of products built from the data of archives specific satellite instruments product_level information on "coarse" product scales channel information on channels of information products Schema for obtaining vproduct description of virtual product types "virtual" products based vproduct_cases variants for the implementation of virtual products depending on the processing of on the types of satellite instruments and available data in the products in the archive archive vchannel schema for obtaining channels of the virtual products 5 Conclusion Currently, the UNISAT technology is being successfully used in the implementation of a number of information systems for satellite data access developed by IKI RAS in cooperation with other organizations. Among them, it is particularly worth noting the "IKI-Monitoring" satellite data center for collective use [5], which provides direct access to satellite data for more than 30 different Earth observation instruments, with the total amount of available data now significantly exceeding 2 petabytes. The same technology is used to maintain archives of the unified satellite data processing system of "SIC "Planeta" [9], which contain data of both Russian and foreign satellite systems of Earth remote sensing. In the future, it is planned to provide effective support for a wider range of tools for remote analysis and processing of satellite data. Acknowledgements. The basic functionality of the created system was developed with the support of FANO (the "Monitoring" theme, registration no. 01.20.0.2.00164); elements connected to the support of the virtual products functionality were developed with the support of RFBR (grant 16-37-00427 mol_a), functions that support the interaction of nodes of extra-large satellite data archives were worked out using distributed satellite data archives in the centers of SRC “Planeta” with the support of RFBR (grant 15-29-07953 ofi-m), Since 2019, technologies have also been developed with the support of FANO (the “Big Data in space research: astrophysics, solar system, geosphere" theme, registration no. 0024-2019-0014). References [1] Loupian E.A., Bourtsev M.A., Proshin A.A., Kobets D.A. Evolution of remote monitoring information systems development concepts // Actual Problems of Remote Sensing of the Earth from Space. 2018. Vol. 15. No. 3. P. 53-66. DOI: DOI: 10.21046/2070-7401-2018-15-3-53-66. [2] Loupian E.A., Savorsky V.P. Basic products of Earth Remote Sensing Data Processing // Actual Problems of Remote Sensing of the Earth from Space. 2012. V. 9. № 2. P. 87-97. [3] Moore R. T., Hansen M. C. Google Earth Engine: a new cloud-computing platform for global-scale earth observation data and analysis // AGU Fall Meeting Abstracts, 2011. Vol. 1. P. 2. [4] Tolpin V.A., Loupian E.A., Bartalev S.A., Plotnikov D.E., Matveev A.M. Possibilities of agricultural vegetation condition analysis with the “VEGA” satellite service // Atmospheric and Oceanic Optics. 2014. Vol. 27. No. 7 (306). P. 581-586 [5] Loupian E.A., Proshin A.A., Bourtsev M.A., Balashov I.V., Bartalev S.A., Efremov V. Yu., Kashnitskiy A.V., Mazurov A.A., Matveev A.M., Sudneva O.A., Sychugov I.G., Tolpin V.A., Uvarov I.A. IKI center for collective use of satellite data archiving, processing and analysis systems aimed at solving the problems of environmental study and monitoring // Actual Problems of Remote Sensing of the Earth from Space. 2015. Vol.12. No 5. P. 263-284. [6] Proshin A.A., Loupian E.A., Balashov I.V., Kashnitskiy A.V., Bourtsev M.A. Unified satellite data archive management platform for remote monitoring systems development // Actual Problems of Remote Sensing of the Earth from Space. 2016. Vol. 13. No. 3. P. 9-27. DOI: 10.21046/2070-7401-2016-13-3-9-27. [7] Loupian E.A., Balashov I.V., Bourtsev M.A., Efremov V. Yu., Kashnitskiy A.V., Kobets D.A., Krasheninnikova Yu. S., Mazurov A.A., Nazipov R.R., Proshin A.A., Sychugov I.G., Tolpin V.A., Uvarov I.A., Flitman E.V. Development of information systems design technologies // Actual Problems of Remote Sensing of the Earth from Space. 2015. Vol.12. No 5. P. 53-75. [8] Efremov V. Yu., Loupian E.A., Mazurov A.A., Proshin A.A., Flitman E.V. A`Technology for Construction of Automated Systems for Satellite Data Storage // "Actual Problems of Remote Sensing of the Earth from Space: Physics, Methods and Technologies for Monitoring of Environment and Hazardous Phenomena and Objects”, Moscow, Polygraph-Service, 2004. P. 437-443. [9] Loupian E.A., Milexin O.E., Antonov V.N., Kramareva L.S., Bourtsev M.A., Balashov I.V., Tolpin V.A., Solovyev V.I. System of operation of joint information resources based on satellite data in the Planeta Research Centers for Space Hydrometeorology // Russian Meteorology and Hydrology. 2014. Vol. 39. Issue 12. P. 847- 853. DOI: 10.3103/S1068373914120103.