Information System of Air Quality Assessment Using Data
Interpolation from Ground Stations
Bohdan Molodets1, Volodymyr Hnatushenko2, Daniil Boldyriev1, Tetiana Bulana2
1
    Oles Honchar Dnipro National University, 35 av. Dmytra Yavornytskoho Dnipro, 49044, Ukraine
2
    Dnipro University of Technology, 19 av. Dmytra Yavornytskoho, Dnipro, 49005, Ukraine


                Abstract
                Monitoring ground stations is crucial for creating interactive maps that assist in assessing air
                quality. A developed information system can aggregate and process the data obtained, which
                is then transformed into a unified format and used as input data for interpolation methods that
                create raster imagery. After processing, the data is stored in Amazon Simple Storage Service
                or database and can be retrieved using application program interfaces (APIs). The proposed
                architectural solution for creating the system includes a toolkit that can work with different
                volumes of data with ease. Using Docker during deployment provides additional capabilities
                for creating a flexible and scalable system. Specific tools such as PostGis and Geospatial Data
                Abstraction Library (GDAL) simplify the processing of data. For instance, GDAL helps with
                the interpolation, cropping, and tiling of the air quality raster image. The article describes the
                structure of the client part and the interface in detail. By using the Mapbox Graphics Library
                system, the system can easily visualize big data as a vector layer, helping users recognize
                hazardous zones and find safe places.

                Keywords 1
                Information system, air quality monitoring, docker, inverse distance weighting, data
                visualization

1. Introduction
    Man-made disasters, such as industrial accidents or transportation incidents, can significantly impact
air quality and lead to harmful health effects for nearby communities [1]. Efforts to prevent and mitigate
these disasters are essential to ensure a safe and healthy environment for all. Air pollution is the biggest
environmental health risk in Europe and has a significant impact on the health of the European
population, especially in urban areas. Although emissions of major air pollutants and their
concentrations in ambient air have decreased significantly over the past two decades in Europe, air
quality remains poor in many regions. In October 2022, the European Commission proposed a revision
of the Ambient Air Quality Directive, which included several key points:
    • Stricter thresholds for pollution that are more in line with the new limits set by the World Health
    Organization.
    • Improved access to justice and the right to clean air.
    • Strengthened rules for air quality monitoring to support preventive action and targeted measures.
    • Requirements to improve air quality modeling, especially in areas with poor air quality.
    • Better public information [2].
    It is necessary to create a system that display current air quality and hazardous areas for person to
be located. Having collected historical and current data, it is necessary to provide information about
pollution at the point.

MoMLeT+DS 2023: 5th International Workshop on Modern Machine Learning Technologies and Data Science, June 3, 2023, Lviv, Ukraine
EMAIL: bogdan.molodets@gmail.com (B. Molodets); vvgnat@ukr.net (V. Hnatushenko); boldyrov@gmail.com (D. Boldyriev);
tatyana.bulanaya@gmail.com (T. Bulana).
ORCID: 0000-0002-7802-389X (B. Molodets); 0000-0003-3140-3788 (V. Hnatushenko); 0000-0002-8502-1446 (D. Boldyriev); 0000-
0001-6346-3326 (T. Bulana).
             © 2023 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
    The developed information system processes the collected data from ground stations around the
world and calculates the AQI (Air quality index) interpolation layer in real time. The novelty of this
article is methodology of information system creating by using mathematical models which calculate
pollution spreading. As a result, weather research and forecast (WRF) model were set up for Ukraine.
Forecast results were compared with forecast result of autoregression integrated moving average model.
The created system is used for storing data, modeling and air quality forecasting. Also, this system can
help government detect problematic parts caused by energy and resource extractive industry and else.
Aggregating and visualization processed data give opportunity to access and compare results of
implemented environmental reforms and innovations in the regions.

2. Related works
    For today, many studies have been conducted with methods for assessing air quality and the factors
that most affect air quality. There are a lot of studies which try to solve a problem of lack on-time data
[3-8]. In paper [9], the authors compared local and global methods for calculating the air quality index
using the example of the city of Dhaka. The findings of this study suggest that the local AQI more
accurately reflects the air quality in the corresponding study area under certain real-world scenarios. In
[10], the authors classified and analyzed data on monitoring emissions of harmful substances in the
Lviv region and created a series of environmental maps based on atmospheric air monitoring data in
the Lviv region. In paper [11], the authors propose a method for predicting air pollution concentrations
in unmonitored areas using spatial interpolation tools in GIS. This provides valuable information on the
dispersion of air pollution. The study in [12] presents recent spatiotemporal changes in the AQI and air
quality in Chongqing's main urban area. A complete dataset was reconstructed using a novel method
(LRTC-TNN) to interpolate missing values. In paper [13], the authors compare a model that uses the
highest sub-index depending on USEPA pollutants standards with a model that includes the weights of
all pollutants as an aggregated air quality index (AAQI) model. The AAQI is a comprehensive indicator
of air quality and is more useful in environmental management as it represents multiple air quality
pollutants. In [14], the authors focus on the management strategy of the environmental ecosystem under
the Artificial Intelligence (AI) algorithm and explore the correlation between air quality and
meteorology. The experimental analysis reveals that the average temperature has a positive correlation
with the AQI, while relative humidity and wind speed have a negative correlation with AQI.
Furthermore, the proposed RF + BP + GA model's prediction error for AQI is no more than 0.32,
indicating an excellent fitting effect with the actual value. In paper [15], the authors explores that the
agricultural sector has a significant impact on the air quality index using the AgrImOnIA (Agriculture
Impact On Italian Air) framework. This framework assesses the role of the livestock sector in air quality
in the Lombardy region and enables comparisons with other European regions.
    Earth remote sensing has become a powerful tool for various application areas such as soil moisture
assessment [16], agricultural monitoring, and air quality assessment. There are numerous satellite data
that can indicate criteria for air pollutants (e.g., PM2.5 and NO2) and greenhouse gases (e.g., CH4 and
CO2). An example of such a satellite is Sentinel-5. It is an atmospheric monitoring mission under the
European Copernicus program, formerly the GMES (Global Environmental and Safety Monitoring)
program [17]. It provides accurate measurements of major atmospheric components such as nitrogen
dioxide, carbon monoxide, ozone, methane, formaldehyde, sulfur dioxide, and aerosol properties. But
most satellite devices cannot distinguish pollution near the surface of the earth from pollution which is
on the upper layer in the atmosphere.

3. Air quality index calculation
   The Air Quality Index – imaginary unit of measure, which is used by government agencies or private
companies to inform citizens about the level of air pollution. Different countries have their own
indicators of air quality in accordance with national standards.
   In general, the lower the AQI, the better the air is considered. If this index increases, then a
significant part of the population faces health risks caused by dirty air.
   That is, AQI is a unit used by the international community to highlight how polluted the air is at the
current time (or how polluted it will be in the near future according to forecasts). The US AQI was
chosen, which was developed by the EPA (Environmental Protection Agency) to facilitate
understanding in the current information system. It is divided into six categories, each of which has its
own impact on health.
   Its formula usually takes into account six main pollutants:
   •     particulate matter PM2,5;
   •     particulate matter PM10;
   •     carbon monoxide (CO);
   •     sulfur dioxide (SO2);
   •     nitrogen dioxide (NO2)
   •     ozone (O3).
   For each of these pollutants, the Environmental Protection Agency has established national air
quality standards aimed at protecting public health. Table 1 below shows the levels of indices and
explanations:

Table 1
AQI categories with description
      Name             Value                                  Description
                      range
       Good            0-50     Air quality is considered satisfactory. Air pollution poses little or no
                                risk.
    Moderate          51-100    Air quality is acceptable; it is important to note that for certain
                                pollutants, there may be a moderate health concern for a very
                                small number of individuals who are particularly sensitive to air
                                pollution.
  Unhealthy for      101-150    Individuals who are part of sensitive groups may experience
     sensitive                  certain health effects due to air pollution, while the general public
      group                     is not likely to be affected.
    Unhealthy        151-200    Although everyone may experience some health effects due to air
                                pollution, individuals who are part of sensitive groups may
                                experience more severe or serious health effects.
       Very          201-300    Hazardous for your health!
    unhealthy
   Hazardous           >300     Hazardous for your health (emergency conditions)! The entire
                                population is at risk of being affected by the health effects of air
                                pollution.

    The quality index shows the corresponding health risk against each pollutant at this point in time.
The Air Quality Index is determined by calculating the highest value for each pollutant using the
following method:
    •    Recognize the highest concentration of all monitors in each reporting area and round as follows:
    CO (ppm) – truncate to 1 character, ozone (ppm) – truncate to 3 characters, PM10 (µg/m3 ) – truncate
    to integer, PM2.5 (µg/m3) – truncate to 1 character, NO2 (ppb) – truncate to integer, SO2 (ppb) –
    truncate to integer.
    •    Using Table 2, we find the interval in which the current concentration is included;
    •    Using equation 1, we calculate the value of the index:
                                     𝐼𝐼𝐻𝐻𝐻𝐻 − 𝐼𝐼𝐿𝐿𝐿𝐿                                                (1)
                           𝐼𝐼𝑝𝑝 =                    �𝐶𝐶 − 𝐵𝐵𝐵𝐵𝐿𝐿𝐿𝐿 � + 𝐼𝐼𝐿𝐿𝐿𝐿 ,
                                  𝐵𝐵𝐵𝐵𝐻𝐻𝐻𝐻 − 𝐵𝐵𝐵𝐵𝐿𝐿𝐿𝐿 𝑝𝑝
where Ip is the value of index for pollutant p, Cp is the index for pollutant p, BPHi is the concentration
breakpoint (is greater than or equal to Cp), BPLo is the concentration breakpoint (is less than or equal to
Cp), IHi is the AQI value corresponding to BPHi, ILo is the AQI value corresponding to BPLo;
   •      Round the index to the nearest integer.

Table 2
Breakpoints for the AQI
     O3             O3             PM2.5          PM10          CO            SO2          NO2          AQI
   (ppm)          (ppm)           (μg/m3)        (μg/m3)      (ppm)          (ppb)        (ppb)
   8-hour         1-hour          24-hour        24-hour      8-hour        1-hour       1-hour
 0.000- 0.054          -          0.0 – 12.0      0 - 54      0.0 - 4.4      0 - 35       0 - 53       0 - 50
 0.055 - 0.070         -          12.1 – 35.4    55 - 154     4.5 - 9.4     36 - 75      54 - 100     51 - 100
 0.071 - 0.085   0.125 - 0.164    35.5 – 55.4    155 - 254   9.5 - 12.4     76 - 185     101 - 360    101 - 150
 0.086 - 0.105   0.165 - 0.204   55.5 - 150.4    255 - 354   12.5 - 15.4   186 - 304     361 - 649    151 - 200
 0.106 - 0.200   0.205 - 0.404   150.5 - 250.4   355 - 424   15.5 - 30.4   305 - 604    650 - 1249    201 - 300
       -         0.405 - 0.504   250.5 - 350.4   425 - 504   30.5 - 40.4   605 - 804    1250 - 1649   301 - 400
       -         0.505 - 0.604   350.5 - 500.4   505 - 604   40.5 - 50.4   805 - 1004   1650 - 2049   401 - 500


4. Weather prediction
    Weather Research and Forecasting (WRF) was chosen as the weather forecasting model, a next-
generation numerical weather forecasting system developed for both atmospheric and operational
forecasts. WRF can perform simulations based on actual atmospheric conditions or idealized conditions.
WRF offers an operational forecasting flexible and computationally efficient platform reflecting the
latest advances in physics, numerical solutions and data assimilation made by developers from a wide
community of researchers [18]. Scheme of WRF is visualized in Figure 1.


Figure 1: WRF scheme of work

    To adjust the model, the data of its forecasts for the past were compared with the real readings of
the state weather stations of Ukraine. Using the Python programming language, interpolation was
carried out between these points. These models were compared with data from real sensors in such
parameters as: Temperature (2m from the ground); U wind; V wind; humidity; pressure.
    It was decided to compare the results of the predictive model with the forecast of autoregressive
models used to describe stationary stochastic processes. The ARMA (Autoregressive moving average)
model is a combination of AR and MA models. AR(p) – a model describing the process under study at
a time that depends on the values p of previous time intervals [19]:
                         𝑦𝑦𝑡𝑡 = 𝛽𝛽1 𝑡𝑡𝑡𝑡−1 + 𝛽𝛽2 𝑡𝑡𝑡𝑡−2 + ⋯ + 𝛽𝛽𝑝𝑝 𝑡𝑡𝑡𝑡−𝑝𝑝 + 𝜀𝜀𝑡𝑡                 (2)
where β1, β2, … , βp – constants, εt – random error.
    The MA model represents a stationary process as a linear combination of consecutive white noise
values. This model is useful as a supplement to autoregression models for a more detailed description
of the noise component. The model is expressed by the following equation:
                         𝑦𝑦𝑡𝑡 = 𝜀𝜀𝑡𝑡 − 𝛾𝛾1 𝜀𝜀𝑡𝑡−1 − 𝛾𝛾2 𝜀𝜀𝑡𝑡−2 − ⋯ − 𝛾𝛾𝑞𝑞 𝜀𝜀𝑡𝑡−𝑞𝑞 ,             (3)
where γ1, γ1, … γ1 – model params.
   On the autocorrelation graph in Figure 2 is shown that the data doesn`t have a tendency and allows
us to conclude that the investigated series is stationary (the stationary series must have a constant
average and must oscillate around this average with constant variance).


Figure 2: Autocorrelation and partial autocorrelation plot

    As time series is stationary, ARIMA (Autoregressive integrated moving average) model must be
used with degree of differencing 0. It use difference of raw observations to make time series
stationary. ARIMA has 3 input params:
    • p – order of autoregression model;
    • d – degree of differencing;
    • q – order of moving average model.
    To compare we used ARIMA(3, 0, 3) model to predict temperature and compare with WRF.


Figure 3: Forecast comparison
   In Figure 3 is shown that difference between the WRF and ARIMA lower that one degree, so root
mean square errors were calculated to access quality of their prediction: WRF-GFS model has
approximately 1.2858, when ARIMA model has only 1.33, so it is mean WRF is more accurate than
ARIMA model, but it can be improved by using postprocessing like Multi-sensor Advection Diﬀusion
of using satellite observations to detect the clouds and advect and diﬀuse the clouds [20].

5. Interpolation methods
    Interpolation tools provide the ability to create a continuous (or predictable) surface by value from
anchor points. Measuring the height, magnitude or concentration for the observed objects and
phenomena, at each point of the studied territory, is usually difficult or very expensive. Instead, system
can measure indicators at anchor points distributed over the surface and predict values that can be
assigned to everyone else. Input points can be located either as a regular grid or randomly.
    Deterministic interpolation methods calculate the result based on the measured values that fall into
the vicinity of the interpolated point, and on given mathematical formulas that determine the
smoothness of the resulting surface. Deterministic methods include IDW (inverse distance weighting)
algorithms (inversely weighted distance method), Nearest Neighbor (nearest neighbor method), Moving
Average and Linear (linear interpolation). Geostatic methods are based on statistical models that include
the analysis of autocorrelation (statistical relations between points that are measured). As a result of
this, geostatistical methods not only have the ability to create a surface of the predicted values, but also
provide an opportunity to determine the accuracy of the forecast. For example, Kriging and its
modifications are one of the most well-known methods of interpolation.
    Inversely weighted distance method – weighted average interpolator. You must provide as input the
values of the scattered data, including the coordinates of each data point and the geometry of the output
grid. The function will calculate the interpolated value for the specified position in the output grid. For
each grid node, the resulting value of Z will be calculated using the formula:
                                                   𝑛𝑛 𝑍𝑍𝑖𝑖                                            (4)
                                               ∑𝑖𝑖=1 𝑝𝑝
                                                      𝑟𝑟𝑖𝑖
                                          𝑍𝑍 =             ,
                                                       1
                                               ∑𝑛𝑛𝑖𝑖=1 𝑝𝑝
                                                      𝑟𝑟𝑖𝑖
where Zi is known point value i, r is the distance from the grid node to the point i, p is weighting power,
n is number of points in Search Ellipse. The smoothing parameter s is used as an additive term in the
Euclidean distance calculation:
                                                                                                      (5)
                                     𝑟𝑟𝑖𝑖 = �𝑟𝑟𝑖𝑖𝑖𝑖 2 + 𝑟𝑟𝑖𝑖𝑖𝑖 2 + 𝑠𝑠 2 ,
where rix and riy are the horizontal and vertical distances between the grid node to i.
  In this method the weighting factor w is
                                                     1
                                            𝐸𝐸𝑤𝑤 = 𝑟𝑟𝑝𝑝 .                                             (6)
    In order to find out the value of the pollution index at a point other than the point of location of the
station, which provided information on pollutants, the IDW interpolation algorithm was chosen, namely
its implementation in GDAL. GDAL is a translator library that can read and write geospatial data in
both raster and vector formats. It provides a unified data model for these formats, allowing applications
to work with a variety of different geospatial data formats through a single API. In addition to the
library, GDAL also includes a number of command-line utilities for manipulating and processing
geospatial data [21].
    All operations with vector and raster are carried out in the virtual memory of the library, which
made it possible to avoid writing files to the hard disk. By transferring the polygon and the type of
pollution index for calculation as a result, we obtain a raster with the results of interpolation, which is
shown in Figure 4.
Figure 4: Interpolation of the air quality index (points - location of ground stations)

   The following manipulations were performed with the raster image:
   • crop regions where there is no data from land stations (countries in Africa, the Middle East, South
   America) and water resources (seas, rivers, oceans), shown in Figure 5;
   • color scheme settings;
   • cutting a raster image into tiles (forms folders with files that are loaded with a certain card zoom)
   for faster rendering.


Figure 5: System displays interpolation results as a bitmap layer on the map

  Mask is used to exclude areas which don’t have enough points to get the result. So, users
weren’t confused by abnormal values for that region.

6. Development techniques
   To visualize the intermediate results and format the source files of the system, the Python
programming language, version 3.7, was chosen. Due to the ability to split programs into modules, the
Python language allows you to use them in other programs. The large library of standard modules
contained in Python is a great foundation for new applications, and the large community helps you get
started with this language easily. Standard modules include many convenient tools for working with
files, system calls, interfaces to various graphical libraries and network connections.
    Django is a Python-based high-level web framework that aims to assist developers in building web
applications as efficiently as possible, from concept to completion. One of Django's core focuses is on
security, and it provides developers with tools to help avoid common security pitfalls and
vulnerabilities. Corresponds to the architectural scheme of the model template views (MTV). Django's
main goal is to make it easier to create complex, database-driven websites. The framework emphasizes
the reuse and "connectivity" of components, fewer code, low connectivity, rapid development, and
SOLID principles. Django also provides an additional administrative interface for creating, reading,
updating and deleting data, which is dynamically generated through introspection and configured using
administrator models.
    Django is compatible with several web servers such as Apache, Nginx using WSGI, Gunicorn or
Cherokee using flup. It can also be run with a FastCGI server which supports web servers like Lighttpd
or Hiawatha. In addition, other WSGI-compatible web servers can be used. The framework supports
four databases: PostgreSQL, MySQL, SQLite, and Oracle, while Microsoft SQL Server can be used
with django-mssql on Microsoft operating systems. External tools are also available for IBM Db2, SQL
Anywhere and Firebird [22]. PostgreSQL, which is one of the officially supported databases, is an open-
source object-relational database that extends the SQL language with many features to safely store and
scale complex data loads.
    PostgreSQL is widely recognized for its robust architecture, high level of reliability, and its ability
to maintain data integrity. Its feature set is also considered to be highly dependable. As a result,
PostgreSQL has earned a strong reputation in the industry. PostgreSQL runs on all major operating
systems, is ACID compatible since 2001, and has powerful applications such as the popular PostGIS
geospatial database extender. Not surprisingly, PostgreSQL has become an open-source relational
database to choose from for many people and organizations [23]. PostgreSQL is equipped with many
features aimed at helping developers build applications, administrators to protect data integrity and
build bounce-resistant environments and help manage your data no matter how large or small the dataset
is. PostgreSQL tries to comply with the SQL standard when such compliance does not contradict
traditional features or can lead to poor architectural solutions. Many of the functions required by the
SQL standard are supported, although sometimes they have slightly different syntax or functions.
    PostGIS is a spatial database extender for PostgreSQL that enables the storage and manipulation of
spatial data, such as points, lines, and polygons. It enhances PostgreSQL with new types (such as
geometry, geography, and raster), functions, operators, and indexes that are specifically designed for
spatial data. With PostGIS, users can perform spatial queries, manipulate spatial data, and perform
complex spatial analysis using SQL. It is widely used in geographic information systems (GIS), web
mapping applications, and location-based services. The list of functions of PostGIS 2+ includes:
    • Processing and analysis of both vector and raster data for splicing, morphing, reclassification and
    collection/merging with SQL power;
    • Spatial reprojection of SQL-functions called by both vector and raster data;
    • Support for importing and exporting vector data of ESRI (Environmental Systems Research
    Institute) files using batch command line and GUI tools and support for other formats with third-
    party open source tools;
    • Command line for importing raster data of many standard formats: GeoTiff, NetCDF, PNG, JPG;
    • Visualization and import of vector data support features for standard text formats such as KML,
    GML, GeoJSON, GeoHash and WKT using SQL;
    • SQL functions for obtaining pixel values by geometric domain, statistics by region, clipping
    raster elements by geometry and vectorization of rasters;
    • Support for network topology.
    Docker Engine is an open source container technology for creating and containerizing your
applications. Docker Engine acts as a client-server program with a working docker process, APIs that
specify interfaces that programs can use to communicate and transmit instructions to the Docker
daemon, Dock server client command line interface (CLI).
    The Command Line Interface utilizes the Docker API to manage and communicate with the Docker
daemon through scripts or direct CLI commands. Other Docker applications also use APIs and CLIs as
their basic interface. The Docker daemon is responsible for creating and managing Docker objects, such
as images, containers, networks, and volumes.
Docker Compose is a tool that enables the description and execution of multi-container Docker
applications. With Compose, you can configure the services for your application using a YAML file.
Once configured, you can create and run all of the services from your configuration with a single
command.
    Compose has commands to manage the entire lifecycle of your application, including:
    • start, stop and restore services;
    • view the status of running services;
    • streaming log output of running services;
    • launch a one-time command on the service.
    Memcached is a repository of key-value pairs in memory for small portions of arbitrary data (strings,
objects) based on database calls, API calls, or page visualization. Its simple design contributes to rapid
deployment, ease of development, and solves many problems with large data caches. Its API is available
for most popular languages.
    The server part is divided into three components. The first is the backend on Django. The second
includes docker settings for deployment. The third is the frontend. Git was selected as the version
control system and created three separate repositories for each component using submodules.
The deployment application includes docker files and script files that are responsible for running each
container. All container data is described in the docker-compose file, including cache, frontend service,
Django application, database, worker database, and more.

7. Description of the information system
    The client part is implemented using the Angular framework, which gave it the following
advantages: hierarchical structure of modules [24], partitioning client parts into 3 layers: presentational
(responsible for displaying data in the system), abstract (a method for the interaction of the main and
presentation layers) and main (responsible for working with data), since the web application – SPA
manages the transfer of data and not views. The system implements data caching, which makes it
possible to speed up data exchange between the client and the server.
    The components themselves are divided into "smart" and "dump" components: one part of the
components can perform certain data manipulations, access the API and others, while the other can only
display data from the parent component. Using this approach gave the following advantages:
    • the ability to reuse is always considered the main advantage of most programming approaches;
    • adherence to the principle of D.R.Y (Don’t Repeat Yourself) means that you can quickly and
    efficiently add the same functionality to different areas;
    • refactoring a part or an entire application requires changes only in fewer locations;
    • readability;
    • facilitates coverage by tests.
    At the component level, a single-directional data stream is configured (data is sent down the
component tree, up – events caused by the user during interaction with the system), which is shown in
Figure 6.
    The following data is submitted to the output of the information system: raster tiles of the constructed
interpolation of the air quality index, enriched on Amazon S3, and data on stations transmitted via
HTTP requests.
    NgRx is used to control the current state, which allows you to reactively track changes. The scheme
of the state manager is shown in Figure 7.
    To display data to the user, the system uses the Mapbox GL JS library. When receiving station data
from the server, they are serialized and brought to the GeoJSON format, and then displayed as a vector
layer on the map, as shown in Figure 8.
    To save on system performance, it was decided to apply clustering to the stations – to group the
stations into a common marker (cluster). Mapbox GL uses greedy clustering as the basis of the
clustering algorithm, which works as follows: any point from the data set is selected, all points in a
certain radius around this point are located; a new cluster with adjacent points is formed, a new point is
selected that is not part of the cluster, and the previous steps are repeated until all points are visited [25].


Figure 6: An example of data exchange between components


Figure 7: The scheme of the state manager

    When user click on the cluster, the map is tampered with until the cluster clicked by the user
disappears. If the user clicked on a marker that is not a cluster, a modal window with information about
this station and a panel is opened that shows the daily indicators (minimum, maximum and average) of
the air quality index during the week (the data interval depends on the frequency of surveying / updating
the station). This functionality is shown in Figure 9.
    In addition, the system has the ability to change the language (available Ukrainian and English), the
ability to save the current state of the map as a raster image to the working machine, switch to other
modes:
    • Fire map;
    • Pollution modeling map in Kryvyi Rih;
    • Radiation contamination map.
Figure 8: Map with ground stations in Eastern Europe


Figure 9: Station information is displayed after clicking on the marker
8. Conclusion
    The developed air quality monitoring system allows you to aggregate and analyze data. The analysis
involves averaging the values of pollutant concentrations and the subsequent calculation of the air
quality index according to European and American standards, etc. Pollution maps were created using
deterministic interpolation methods such as IDW or Geostatic methods such as Kriging and its
modifications. As an additional post-processing (in addition to specifying the color scheme), mask
trimming and tiling operations are used.
    Weather research and forecast model were used for predicting temperature used data from Ukrainian
data stations. The results of forecasting were complained with ARIMA forecast, that in result get worse
quality of prediction. In future we are going to improve prediction of WRF using neural network as
postprocessing tools to prevent mathematical deviation of model.
    The task of scaling and deploying on the server using docker is facilitated. For today all
infrastructure ran altogether as monolith system. After forecast improvement we will try to make system
more flexible and stable by separate parts of projects in its own server instances, but such solution can
cause a lack of finances.
    The described architecture and results of modeling are used in the YourAirTest air monitoring
system.

9. References
[1] V.Ye. Kolesnik, O.O. Borysovs'ka, А.V. Pavlychenko, A.L. Shirin. Determination of trends and
     regularities of occurrence of emergency situations of technogenic and natural character in Ukraine.
     Naukovyi visnyk Natsionalnoho hirnychoho universytetu. 2017. № 6. - P. 124-131.
[2] Air quality in Europe, 2022 URL: https://www.eea.europa.eu/publications/air-quality-in-europe-
     2022.
[3] S. Han, W. Kundhikanjana, P. Towashiraporn, D. Stratoulias. Interpolation-Based Fusion of
     Sentinel-5P, SRTM, and Regulatory-Grade Ground Stations Data for Producing Spatially
     Continuous Maps of PM2.5 Concentrations Nationwide over Thailand. Atmosphere 2022, 13,
     161. doi: 10.3390/atmos13020161.
[4] J. Gitahi, M. Hahn, (2020). High-resolution urban air quality monitoring using Sentinel satellite
     images and low-cost ground-based sensor networks. E3S Web of Conferences, 171, 02002.
     doi: 10.1051/e3sconf/202017102002.
[5] L. Chen, J. Wang, H. Wang, T. Jin. Urban Air Quality Assessment by Fusing Spatial and Temporal
     Data from Multiple Study Sources Using Refined Estimation Methods. ISPRS Int. J. Geo-Inf.
     2022, 11, 330. doi: 10.3390/ijgi11060330.
[6] V.I. Olevskyi, V.V. Hnatushenko, G.M. Korotenko, Yu.B. Olevska, Y.O. Obydennyi. Application
     of two-dimensional Padé-type approximations for image processing. Radio Electronics, Computer
     Science, Control, 2023, № 1, P.99-106. doi: 10.15588/1607-3274-2023-1-10.
[7] M. Usama. Urban Air Quality Measurements: A Survey. Preprints.org; 2022. doi:
     10.20944/preprints202204.0232.v1.
[8] T. Singh, N. Sharma, Satakshi, M. Kumar. Analysis and forecasting of air quality index based on
     satellite data. Inhal Toxicol. 2023 Jan-Feb;35(1-2):24-39. Epub 2023 Jan 5. PMID: 36602767. doi:
     10.1080/08958378.2022.2164388.
[9] А. Ahmed, А. Ali, М. Mahboob, F. Humaira. Comparison between Local and Global Methods to
     Develop AQI in Representing the Spatial Pattern of Air Quality of Dhaka City. The Dhaka
     University Journal of Earth and Environmental Sciences. 2023. 131-149. DOI:
     10.3329/dujees.v11i1.63716.
[10] O. Serant, N. Yarema, А.Р Согор, M.S. Geba. Stvorennya ekologichnih kart Lvivschini za danimi
     monitoringu atmosfernogo povitrya. Young Scientist. 2018, 5. 23-27.
[11] F.R. Afghan, H. Habib, N.A. Akhundzadah, W. Wafa, M. Shirzad, K. Sahak, S.K. Hashmi,
     M. Mujeeb, K. Wardak, M. R. Ahmadzai. Customization of GIS for spatial and temporal analyses
     of Air Quality Index trends in Kabul city. Modeling Earth Systems and Environment. 2022. doi:
     10.1007/s40808-022-01396-5.
[12] H. Zhang, Y. Nie, Q. Deng, Y. Liu, Q. Lyu, Qiyuan, B. Zhang. Spatio-Temporal Changes in Air
     Quality of the Urban Area of Chongqing from 2015 to 2021 Based on a Missing-Data-Filled
     Dataset. Atmosphere. 2022 13. 1473. doi: 10.3390/atmos13091473.
[13] A. S. Shihab. Assessment of Air Quality through Multiple Air Quality Index Models – A
     Comparative Study. Journal of Ecological Engineering. 2023 24. 110-116. doi:
     10.12911/22998993/159398.
[14] R. Liu, L. Pang, Y. Yang, G. Yidian, G. Yuxing, G. Bei, L. Feng, W. Li. Air Quality—Meteorology
     Correlation Modeling Using Random Forest and Neural Network. Sustainability. 2023 15. 4531.
     doi: 10.3390/su15054531.
[15] A. Fassò, J. Rodeschini, A. Moro, Q. Shaboviq, P. Maranzano, M. Cameletti, F. Finazzi, N. Golini,
     R. Ignaccolo, P. Otto. Agrimonia: a dataset on livestock, meteorology and air quality in the
     Lombardy region, Italy. Scientific Data. 2023. doi: 10. 10.1038/s41597-023-02034-0.
[16] I.N. Garkusha, V.V. Hnatushenko, V.V. Vasyliev, (2017). Using Sentinel-1 data for monitoring of
     soil moisture. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
     doi:10.1109/igarss.2017.8127291.
[17] S. Jutz, M. Milagro-Pérez. Copernicus: the European Earth Observation programme. Revista de
     Teledetección. 2020. doi: 10.4995/raet.2020.14346.
[18] J. Hinestroza-Ramirez, J. Rengifo-Castro, O. Quintero, A. Yarce Botero, A. Rendon-Perez. Non-
     Parametric and Robust Sensitivity Analysis of the Weather Research and Forecast (WRF) Model
     in the Tropical Andes Region. Atmosphere. 2023 14. 686. doi: 10.3390/atmos14040686.
[19] S. Antonenko, T. Bulana, B. Molodets. Modeliuvannia prohnozuvannia nebezpeky vynyknennia
     nadzvychainoi sytuatsii. System technologies. Vol. 1 No. 120 (2019). P.44-49.
[20] P.Jiménez, J. Dudhia, G. Thompson, J. Lee, T. Brummet. Improving the cloud initialization in
     WRF-Solar with enhanced short-range forecasting functionality: The MAD-WRF model. Solar
     Energy. 2022. 239. 221-233. doi: 10.1016/j.solener.2022.04.055.
[21] GDAL Grid Tutorial, 2023 URL: https://gdal.org/tutorials/gdal_grid_tut.html#interpolation-of-
     the-scattered-data.
[22] Richard Bullington-McGuire, (2020). Docker for Developers: Develop and run your application
     with Docker containers using DevOps tools for continuous delivery.
[23] V. Hnatushenko, Vik. Hnatushenko, N. Dorosh, N. Solodka, O. Liashenko. Non-relational
     approach to developing knowledge bases of expert system prototype. Naukovyi Visnyk
     Natsionalnoho Hirnychoho Universytetu, 2022, № 2. P.112-117. doi: 10.33271/nvngu/2022-2/112.
[24] R. Jadhav. Role of angular in web development. 2021 8. 783-785.
[25] R. Nétek, J. Brus, Tomecka. Performance Testing on Marker Clustering and Heatmap
     Visualization Techniques: A Comparative Study on JavaScript Mapping Libraries. ISPRS
     International Journal of Geo-Information. 2019 8. 348. doi: 10.3390/ijgi8080348.