<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Development of the computing node for processing satellite imagery and spatial data for earth sciences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksey A. Zagumennov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vera V. Naumova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Automation and Control Processes of FEB RAS</institution>
          ,
          <addr-line>Vladivostok</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vernadsky State Geological Museum of RAS</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>272</fpage>
      <lpage>279</lpage>
      <abstract>
        <p>The work is devoted to the development of a computing node for processing satellite and spatial data for earth sciences by the example of its implementation as part of the Information and Analytical Environment to support scientific research in geology of the Vernadsky State Geological Museum (SGM RAS). The prerequisites for the creation of such a computing node and the requirements for it to solve geological problems are given. An overview of cloud platforms for access to satellite and spatial data and its processing has been presented. Based on the overview a conceptual diagram of a computing node has been proposed and the list of modern technologies required for building it has been determined. The developed node provides tools for searching data from external cloud providers, processing them with various built-in and custom algorithms, as well as tools for visualizing the results. It is an independent web service, although it is part of the Computational and Analytical Geological Environment of SGM RAS and is integrated with its services. This allows a wide range of users to access data and processing algorithms provided by computing node, including integrating it into other information systems as a third-party application for processing satellite and spatial data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cloud services</kwd>
        <kwd>geology</kwd>
        <kwd>computing node</kwd>
        <kwd>REST API</kwd>
        <kwd>satellite imagery</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The most of nowadays research in geosciences cannot be conducted without satellite or spatial
data. Remote sensing with modern satellites provides information about Earth in many spectral
ranges. Various physical parameters of the surface, ocean and atmosphere are calculated using
this information. Satellite imagery is used in research in meteorology, oceanology, geology, and
other earth sciences. Processed remote sensing data become crucial in industry, agriculture,
territory management and other areas.</p>
      <p>Problems and tasks that involve satellite imagery and spatial data processing require huge
amount of storage space and computing power in terms of infrastructure, as well as a large
number of special competencies in the field of geoinformatics in terms of qualification requirements.
This fact along with the constant growth in the number of new satellites launched every year
have led to the emergence of specialized cloud platforms and services to work with satellite
imagery and spatial data. They reduce the cost of solving certain scientific and applied problems,
providing access to data and giving tools and algorithms to process this data. Considering only
scientific problems one can notice that they define a number of basic requirements for such
cloud platforms and services: the possibility of open access for scientific purposes, simplicity
and flexibility of use for various scientific problems, the possibility of implementing custom
processing algorithms, and reproducibility of results.</p>
      <p>Three main problems arise working with satellite and spatial data:
1) spatiotemporal search for satellite and spatial data and access to them;
2) interactive data visualization and analysis;
3) data processing with standard or user-defined algorithms.</p>
      <p>
        Modern cloud solutions in considered field partially or completely solve these problems. The
problem of access to satellite imagery should be highlighted here. The development of satellite
remote sensing and forming of various initiatives to provide access to satellite data made it
possible to create distributed systems providing access to a wide variety of satellite and spatial
data, and researchers and other users got access to data directly through the web interfaces,
without the need to download them directly into local storage. Mentioned above problems of
working with satellite and spatial data and impact of modern cloud platforms to their solution
are discussed in the related work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Information and Analytical Environment has been implemented to support scientific research
in geology (http://geologyscience.ru) created few years ago operates in Vernadsky State
Geological Museum of Russian Academy of Sciences (SGM RAS) providing a single entry point to various
databases and cloud processing services for solving scientific problems in geology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
information and analytical environment consists of services for accessing various types of data,
including a service for accessing satellite data (https://sputnik.geologyscience.ru) and the computing
and analytical environment for processing geological data (https://service.geologyscience.ru),
the latter integrates various external computing nodes and cloud platforms for the processing
of quantitative, spatial and text data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In this paper, an approach to creating a computing node for processing satellite and spatial
data in the ecosystem of the Information and Analytical Environment of the SGM RAS for
solving geological problems is proposed to consider.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of cloud platforms to work with satellite and spatial data</title>
      <p>
        There are many cloud platforms and services to work with satellite and spatial data nowadays.
They difer in functionality, type of satellite and spatial information provided, thematic focus,
ease of use, cost. Many of the platforms provide an option to use them for research purposes, but
often only a part of the functionality provided. Examples of several platforms, their architecture,
and provided features are presented in the related work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We will give an overview of the
platforms from the point of the possibility of their use for scientific research involving satellite
and spatial data. All platforms can be divided into two large classes: general purpose platforms
and thematic ones.
      </p>
      <p>General-purpose platforms allow to solve a wide range of tasks by means of access to diferent
satellite and spatial data, variety of visualization tools, custom processing algorithms and the
mechanisms to share the results of work. The main general-purpose platforms are considered
below.</p>
      <p>Google Earth Engine (https://earthengine.google.com) is a cloud platform for analysis and
visualization of geospatial data, free for the scientific purposes. The platform aggregates data
from various satellite platforms and instruments. To work with this archive of satellite and
spatial data, tools for visualization tools and development of processing algorithms are provided
powered by the Google Cloud Platform.</p>
      <p>Earth Observing System (https://eos.com) is a commercial cloud platform that provides access
to a wide range of satellite data, including ultra-high resolution. The platform ofers a wide
range of visualization and analytics tools, as well as a set of common algorithms for processing
satellite data for various tasks. To use it in scientific purposes it is required to sign special
agreement.</p>
      <p>Planet (https://www.planet.com) is another commercial platform from Planet, which is
distinguished by the fact that the company has its own constellation of Earth observation satellites,
which makes it possible to track in more detail all changes in various parameters of the regions
of interest. In addition to valuable satellite data, the platform has advanced visualization and
change monitoring tools, as well as integration tools with various satellite data processing
applications. It is possible to use the platform and its data for research and educational purposes.</p>
      <p>Descartes Labs (https://www.descarteslabs.com) is a commercial cloud platform that
aggregates and prepares satellite data for further analysis from various data providers. It has
modern tools for visualization and data analysis, as well as special workspaces where, using the
computing power and the programming interface of the platform in the Python programming
language, users can build their own satellite data processing workflows to solve their problems.
It also provides an opportunity for free use in for scientific purposes.</p>
      <p>ArcGIS Online (https://www.esri.com/en-us/arcgis/products/arcgis-online/overview) is a
commercial cloud platform from a well-known developer of geographic information systems
(GIS), which is primarily a cloud GIS. Provides a wide range of tools for working with geospatial
data (raster and vector), including analytics and a Python API. It is possible to upload your data,
share the results of your work, use the results of the work of other users of the platform.</p>
      <p>Astraea (https://astraea.earth) is a commercial cloud platform which has a well-structured
satellite and spatial data processing workflow as a distinctive feature.
1. Automatic continuous delivery of satellite images of the region of interest from the required
satellites.
2. Custom processing algorithms creation using the JupyterLab environment in the Python
programming language.
3. Algorithms scaling to cloud computing nodes using Big Data approaches.
4. Creation of automated analytical tasks using low/no-code approaches.</p>
      <p>The platform provides an opportunity for cooperation in solving research and scientific
problems.</p>
      <p>Sentinel Hub (https://www.sentinel-hub.com) is a commercial platform with great
opportunities for research and solving scientific problems. It provides access to data from satellites
of the Sentinel and Landsat constellations, as well as satellites Meris and Proba-V, has flexible
visualization tools, and, like Google Earth Engine, allows users to implement their own
scenarios for processing satellite data and provides a number of software interfaces for automated
interaction with platform. It is also possible to use custom data.</p>
      <p>Thematic platforms are aimed at solving a certain narrow range of tasks, as a rule, limited to
a certain geographic region, providing the most relevant and thoroughly selected set of data,
tools and algorithms for these tasks and region. Here are some examples of thematic platforms.</p>
      <p>USGS Web Applications (https://www.usgs.gov/products/data-and-tools/web-application) is
a collection of thematic web services from the US Geological Survey to solve a wide range of
tasks in areas from geology to climate, mainly for the territory of USA. Web services use the
USGS cloud data access platform and internal cloud infrastructure. All services and data access
platform are free for scientific research.</p>
      <p>NASA Earthdata Tools (https://earthdata.nasa.gov/earth-observation-data/tools) is a set
of cloud-based tools from the US National Aerospace Agency, which provide the following
functionality: data search and ordering, data preprocessing and filtering, geolocation and
cartography, data visualization and analysis. Each of the services from this catalog is focused on
solving a fixed range of tasks from certain domains. The services are also free and not limited
just to the United States region.</p>
      <p>Copernicus Marine Service (https://marine.copernicus.eu) is a cloud-based platform focused
on research of the World Ocean, which is implemented as part of the European Union’s
Copernicus Program. The platform ofers access to satellite and spatial data on the state of the ocean, as
well as a range of tools for visualizing and analyzing various parameters of the ocean, thereby
allowing to solve scientific and applied problems requiring this type of data. The platform is free
and covers the entire World Ocean. There are similar thematic platforms within the Copernicus
Program for climate, atmosphere and land studies.</p>
      <p>Digital Earth Australia (https://www.ga.gov.au/dea/products) is a thematic cloud platform
for monitoring various physical parameters in Australia. It combines satellite data and a set of
thematic services that are focused on solving specific problems: from changing coastlines to
determining freshwater reserves. Access to the platform is free.</p>
      <p>Swiss Data Cube (https://www.swissdatacube.org) is a thematic cloud platform similar to the
previous one, only focused on monitoring the territory of Switzerland.</p>
      <p>Brazil Data Cube (http://brazildatacube.org) is a thematic cloud platform focused on
monitoring the territory of Brazil.</p>
      <p>The above review of cloud platforms for working with satellite and spatial data allows us
to conclude that, to solve modern problems, such platforms, on the one hand, require the
implementation of data access, processing and visualization capabilities, and on the other hand,
simplicity and flexibility of use in modern thematic scientific tasks with the ability to define
data sets and algorithms for their processing. At the same time, general-purpose platforms still
have their own specifics and diferent goals, diferences in the set of services provided, and
the possibilities of using them for research purposes. This explains the large number of such
platforms.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Computing node for processing satellite imagery and spatial data for earth sciences</title>
      <p>Earth observation data are widely used in modern geology in a wide range of tasks: lineament
analysis, minerals mapping, structural-tectonic analysis of deposits, monitoring of geodynamic
processes, study of permafrost processes, study of the material composition of rocks, rational
land-use, etc. Solving these problems requires a certain set of satellite and spatial data and
algorithms for their processing, the composition of which is constantly changing with the
development of methods for solving problems, also the problems themselves are changing. This
defines certain requirements on tools for processing satellite and spatial data, which should
combine the properties of both general-purpose and thematic platforms.</p>
      <p>In terms of data access requirements, it is necessary to search and obtain data from various
satellites and radiometers, as well as from spatial data catalogs. One solution is to create a data
access gateway. Thus, the service for accessing satellite data of the Information and Analytical
Environment of the SGM RAS (http://sputnik.geologyscience.ru) provides the search of satellite
data from various providers: the US Geological Survey (http://usgs.gov), the Japan Aerospace
Agency research (https://www.eorc.jaxa.jp), satellite center. Goddard (https://oceancolor.gsfc.
nasa.gov), Scientific Center for Operational Monitoring of the Earth (http://www.ntsomz.ru),
Center for Collective Use of Regional Satellite Environmental Monitoring of the Far Eastern
Branch of the Russian Academy of Sciences (http://satellite.dvo.ru). The service provides access
to the following types of satellite information: data from Landsat-7/8 satellites and
Sentinel2A/2B satellites; satellite topography data STRM and ALOS; data from meteorological satellites
Aqua and Terra; data from satellites EO-1, OrbView-3, as well as from the Russian satellite
Kanopus-V.</p>
      <p>Another option for providing access to data is to connect to satellite and spatial data catalogs
that implement the modern rapidly developing Spatiotemporal Asset Catalog (STAC)
speciifcation (https://stacspec.org). This specification regulates the rules for describing data and
collections of spatiotemporal data to provide unified access to this data and navigation through
it using a self-documented directory structure and data description. The main advantage of
using this specification by data providers and their consumers is the uniformity of access to
data without the need to change processing workflows and algorithms when adding new data
types.</p>
      <p>Modern work with satellite and spatial data takes place in web applications through interaction
with a cartographic interface, into which the needed data is loaded, receiving from cloud
platforms in real time. To provide such work with data, special storage formats and some
preprocessing of real data are required to optimize delivery over the network, as well as render
in the browser. For these purposes, the approaches ofered by the Open Geospatial Consortium
(OGC) (https://ogcapi.ogc.org) have been used for a long time: Web Map Service, Web Feature
Service, Web Coverage Service, etc. But more recently, the Cloud-Optimized Geotif (COG)
standard (https://www.cogeo.org) has also begun to be used for these purposes. This standard
adds to regular Geotif files the ability to store overview images as well as smaller chunks of
the original image for quick access maintaining backward compatibility. The only prerequisite
for accessing such data over the network is support for HTTP GET Range requests by both the
client and the server.</p>
      <p>
        The overview of thematic cloud platforms for working with satellite and spatial data shows
that an increasing number of such platforms are created using modern opensource software Open
Data Cube (ODC) (https://www.opendatacube.org) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. ODC is an opensource suite of geospatial
data management and analysis software built on top of other open technologies. It combines
tools for cataloging data, direct access to data in the form of data cubes — multidimensional
spatiotemporal arrays of measurements — and functions in the Python programming language
to provide computations, including distributed ones. Thus, ODC can be the main system for
general and thematic processing of satellite and spatial data of any size: from an individual
researcher’s workplace to a cloud platform [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The modern landscape of platforms to work with satellite and spatial data, as well as a set
of technological solutions and standards in the field of geoinformatics, make it possible to
implement a similar approach when developing a computing node for processing satellite and
spatial data within the ecosystem of the Information and Analytical Environment of SGM RAS.
The schematic diagram of the computing node is shown in Figure 1. Computing node consists
of the following components:
• a cartographic web interface for user interaction with a computing node by searching
for satellite and spatial data for the region and dates of interest, defining a processing
algorithm, choosing a visualization and analysis methods for the result;
• a subsystem for processing and dispatching requests, which provides interaction between
the web application and the rest of the computing node;
• a data access subsystem that provides interaction with STAC-catalogs of external cloud
providers, a service for accessing satellite data of the SGM RAS, and contains service
functions for working with the local ODC catalog, which is required for the ODC to work
properly;
• data processing subsystem, which is based on the ODC platform, providing tools for
satellite imagery processing algorithms implementation;
• a task queue, into which incoming requests for data processing are placed with an
indication of the algorithm from the data processing subsystem, and which can track the
status of tasks execution;
• task executors that process tasks from the queue in a distributed manner;
• local storage of data processing results in COG format and temporary files.</p>
      <p>Calculations are performed in the Python environment using the ODC package and several
auxiliary packages for working with geospatial data. The processing of incoming requests
for data processing is performed by the FastAPI framework using the REST API implemented
according to the OpenAPI standard (http://spec.openapis.org/oas/v3.0.3), using a queue of
computational tasks based on the NoSQL Redis database.</p>
      <p>Moreover, individual components of the system — the web interface, subsystems, the local
ODC directory, a database with a task queue, task executors — are deployed using Docker
containers. This architecture allows the processing of requests and heavy computation of large
amounts of data to be separated providing fault tolerance and scalability of the considered
computing node.</p>
      <p>The current implementation of the proposed conceptual scheme is a prototype with a
cartographic web interface that implements data search in the STAC catalog of the Landsat-8 satellite
(https://landsat-stac.s3.amazonaws.com), as well as algorithms for calculating various spectral
indices. satellite channels.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Further work</title>
      <p>Further development of the computing node is supposed to be carried out in three directions:
• increasing the functionality of the web interface by adding new visualization and data
analysis tools;
• development of a data access subsystem through the ability to connect arbitrary STAC
catalogs, as well as integration with the satellite data access service of SGM RAS;
• expansion of the list of algorithms provided by the data processing subsystem, as well as
the implementation of the possibility of adding custom algorithms.</p>
      <p>In addition, the approaches and modular architecture the computing node is based will make
it possible to transform it into an independent cloud platform for solving a wide range of tasks
for earth sciences. And the ease of use and flexibility of tuning for specific tasks will attract a
wide range of scientists and researchers to its use.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The study is supported by the Government contract No. 0140-2019-0005 “Development of an
information environment for integrating data from natural science museums and services for
their processing for Earth sciences”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sudmanns</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiede</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lang</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergstedt</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trost</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augustin</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baraldi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blaschke</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Big</surname>
          </string-name>
          <article-title>Earth data: Disruptive changes in Earth observation data management and analysis</article-title>
          ? // International Journal of Digital Earth.
          <year>2020</year>
          . Vol.
          <volume>13</volume>
          . Is. 7. P.
          <volume>832</volume>
          -
          <fpage>850</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Eremenko</surname>
            <given-names>V.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naumova</surname>
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Platonov</surname>
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyakov</surname>
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eremenko</surname>
            <given-names>A.S.</given-names>
          </string-name>
          <article-title>The main components of a distributed computational and analytical environment for the scientific study of geological systems //</article-title>
          <source>Russian Journal of Earth Sciences. 2018</source>
          . Vol.
          <volume>18</volume>
          . No.
          <article-title>6. ES6003</article-title>
          . DOI:
          <volume>10</volume>
          .2205/2018ES000636.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Eremenko</surname>
            <given-names>V.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naumova</surname>
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zagumennov</surname>
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bulov S</surname>
          </string-name>
          .V.
          <article-title>Cloud technologies for development of geographically distributed computational and analytical</article-title>
          geological environment // Computational Technologies.
          <year>2021</year>
          . Vol.
          <volume>26</volume>
          . Is. 1. P.
          <volume>86</volume>
          -
          <fpage>98</fpage>
          . DOI:
          <volume>10</volume>
          .25743/ICT.
          <year>2021</year>
          .
          <volume>26</volume>
          .1.007.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gomes</surname>
            <given-names>V.C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Queiroz</surname>
            <given-names>G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferreira</surname>
            <given-names>K.R.</given-names>
          </string-name>
          <article-title>An overview of platforms for big earth observation data management</article-title>
          and analysis // Remote Sensing.
          <year>2020</year>
          . Vol.
          <volume>12</volume>
          . Is. 8. P.
          <volume>1253</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Dhu</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giuliani</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juárez</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavvada</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Killough</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merodio</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minchin</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramage</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>National open data cubes and their contribution to country-level development policies</article-title>
          and practices // Data.
          <year>2019</year>
          . Vol.
          <volume>4</volume>
          . Is. 4. P.
          <volume>144</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Giuliani</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chatenoux</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piller</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moser</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacroix</surname>
            <given-names>P</given-names>
          </string-name>
          .
          <article-title>Data cube on demand (DCoD): Generating an earth observation data cube anywhere in the world //</article-title>
          <source>International Journal of Applied Earth Observation and Geoinformation</source>
          .
          <year>2020</year>
          . Vol.
          <volume>87</volume>
          . DOI:
          <volume>10</volume>
          .1016/j.jag.
          <year>2019</year>
          .
          <volume>102035</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>