=Paper= {{Paper |id=Vol-3041/424-428-paper-78 |storemode=property |title=Intelligent Environmental Monitoring Platform |pdfUrl=https://ceur-ws.org/Vol-3041/424-428-paper-78.pdf |volume=Vol-3041 |authors=Alexander Uzhinskiy }} ==Intelligent Environmental Monitoring Platform== https://ceur-ws.org/Vol-3041/424-428-paper-78.pdf
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



        INTELLIGENT ENVIRONMENTAL MONITORING
                      PLATFORM
                                          A.V. Uzhinskiy

     Joint Institute for Nuclear Research, 6 Joliot-Curie, Dubna, Moscow region, 141980, Russia

                                       E-mail: auzhinskiy@jinr.ru


Air pollution has a significant impact on human and environmental health. The aim of the UNECE
International Cooperative Program (ICP) Vegetation within the United Nations Convention on Long-
Range Transboundary Air Pollution (CLRTAP) is to identify the main polluted areas of Europe,
produce regional maps and further develop an understanding of long-range transboundary pollution.
The program is implemented in 43 countries of Europe and Asia. Mosses are collected at thousands of
sites. The development of the data management system (DMS) for the ICP Vegetation program was
initiated in 2016 at the Meshcheryakov Laboratory of Information Technologies of the Joint Institute
for Nuclear Research. The DMS offers good options to simplify and automate the environmental
monitoring process. We use several powerful technologies to provide a new level of service for ICP
Vegetation participants. The platform has some worthwhile analytics, classification and prediction
abilities. The current architecture, workflow, and principles of data processing and analysis will be
presented.

Keywords: environmental monitoring, air pollution, data management, intelligent platform,
intellectual data processing, machine learning, neural networks.



                                                                                     Alexander Uzhinskiy

                                                             Copyright © 2021 for this paper by its authors.
                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                   424
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



1. Introduction
          Air pollution is recognized as the fifth largest threat to human health [1]. According to the
World Health Organization, 7 million people die from polluted air every year. Since the late 1960s in
the USA and since 1970 in Europe, regulatory conventions and acts to protect health and the
environment from air pollution have been signed. At present, there are many regional, local and global
programs focused on the determination of air quality. In most cases, they use monitoring station
networks or mobile measurement stations to provide information on regulatory air pollutants such as
gaseous pollutants and particulate matter (PM). More detailed information on contamination can be
obtained using more advanced techniques. During the last several decades, biomonitoring methods
have been used to get data on heavy metals (such as antimony, mercury, lead, etc.), organic pollutants
(benzo[a]pyrene), and radionuclides [2]. One of the projects aimed to identify the main polluted areas
in Europe and produce regional maps is the UNECE International Cooperative Program (ICP)
Vegetation. The program functions within the United Nations Convention on Long-Range
Transboundary Air Pollution (CLRTAP). Transboundary pollution is a real threat clearly demonstrated
by the sand from the Sahara desert lying on European streets. The UNECE ICP Vegetation program
brings together researchers from 43 countries of Europe and Asia. Samples are collected at thousands
of sites. In 2016, a Data Management System (DMS) of the UNECE ICP Vegetation was developed at
the Meshcheryakov Laboratory of Information Technologies of the Joint Institute for Nuclear
Research [3]. The DMS is designed to provide the ICP Vegetation community with a unified system
for gathering, storing, analyzing, processing, and sharing biological monitoring data. The DMS can
now considered to be an intelligent environmental monitoring platform. Since the first version of the
platform presentation, a mobile application has been developed to simplify the process of collecting
and verifying data. Deep learning models for image classification have been created. Statistical and
neural models for pollution prediction based on remote sensing data have been elaborated. The
analytical capabilities of the platform have been expanded.

2. Intelligent platforms
          The level of automation and adoption of modern information technologies in environmental
monitoring programs is constantly increasing, although it lags far behind areas where the use of
advanced technology can lead to a rapid economic impact. Nevertheless, over the past decade, various
powerful technologies have been used in environmental pollution control projects, which makes it
possible to provide a new level of service, as well as the quality and speed of researches. Now we can
talk about intelligent platforms capable of generating new knowledge based on incoming and available
data and, in some cases, making decisions that previously required the competence of an expert. Here
is a short list of such technologies: The Internet of things (IoT) specifies the principles of connecting
and exchanging data between physical objects that are embedded with sensors and other objects,
programs, and systems. Many platforms use IoT technologies to organize sensor networks and process
environmental monitoring data. It allows one to minimize the number of errors, automate routine
processes and speed up data-gathering routines. Big Data is a field that treats ways to analyze,
systematically extract information, or otherwise, deal with datasets that are too large or complex to
handle using traditional data-processing application software. In the case of environmental monitoring,
the data we have to work with can be both large if we deal with a huge sensor network and complex if
we deal with sampling sites' metadata. Artificial intelligence (AI) is a wide-ranging branch of
computer science concerned with building smart machines capable of performing tasks that typically
require human intelligence. In environmental monitoring, there are always operations demanding an
expert opinion. AI technologies can execute primary analysis and save the expert’s time. Machine
learning is a method of data analysis that automates analytical model building. It is a branch
of artificial intelligence based on the idea that systems can learn from data, identify patterns and make
decisions with minimal human intervention. Both classification and prediction tasks of ML are highly
useful for environmental monitoring. Many other technologies such as robotics, remote sensing,
drones, etc. can also be mentioned. The general idea is that the use of one or more of these
technologies can enhance the abilities of the digital platform and allows it to perform intelligent
functions.


                                                   425
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



3. Data & Workflow
         ICP Vegetation participants collect moss samples at thousands of sites. They should record the
metainformation about sampling sites required by the UNECE ICP Vegetation manual. This data is
used in the interpretation stage of research. We had to develop a mobile application to simplify
metadata management. Now latitude, longitude, and altitude are set automatically at sampling sites,
and most of the required parameters are implemented as lists. The mobile application allows getting
visual information, for example, pictures of samples and of sampling sites. Such images can be used to
verify the correctness of the input data. The given approach dramatically reduces the number of errors
in metadata. Once collected, the samples are processed using different techniques, such as neutron
activation analysis, to determine the concentrations of heavy metals, persistent organic pollutants,
nitrogen, or radionuclides. Each sampling site has a unique ID that is used to import information on
concentrations. Participants can manipulate data, create different kinds of maps, run prediction tasks,
and get analytical reports on the platform. With simple statistical reports and geo indexes, it is also
possible to carry out cluster or principal component analysis. Participants can build historical trends
and compare the data with data from other countries or regions. For example, to better understand the
global situation, the median values of heavy metal pollution with bordering countries and regions can
be shown in one diagram. Coordinators have access to all tools of ordinary participants; in addition,
they can perform group operations with data, receive summary reports and build global maps of
pollution.


4. Architecture
         To achieve the necessary scalability at the resource level, the platform is built on a cloud
infrastructure based on Open Nebula. The amount of data coming to the platform from participants is
rather small, but it has a complex structure. We have to manage collections of sampling sites,
persistent organic pollutants, intercomparison data, etc. Each object can have tens to hundreds of
fields, as well as geospatial data. Data automatically collected for forecasting is estimated at millions
of records. In such conditions, it is preferable to use NoSQL solutions. In our case, this is MongoDB,
which allows one to work with geospatial data and achieve high performance with correctly specified
indexes. As a web server, we use Nginx, the performance of which is sufficient for our tasks.

       Samples collection         Samples analysis                Data analysis   Data presentation          Prediction/Controle




                                                                                            Google Earth Engine




                                                     Tensorflow
                                                       Keras                                      HybriLIT




                            Figure 1. Architecture of the environmental monitoring platform

        The server part of the platform provides client and program interfaces and also organizes the
work of many services. Some tasks take time to complete, for example, collecting additional data for
models or selecting their optimal parameters. To implement them, a microservice architecture is used.
It allows one to scale the solution and make changes only to those processes where changes are taking


                                                                    426
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



place, without affecting other parts of the platform. For tasks related to machine learning and neural
networks, the JINR HybriLIT heterogeneous computing platform is utilized. Machine learning models
are implemented in Python using the Keras and TensorFlow libraries. To get satellite imagery and
geospatial datasets, we use the Python interface of the Google Earth Engine platform.


5. Intelligent functions
         The identification of moss species is important for the quality of analysis. We want to be able
to classify moss images in our mobile application. Several deep learning models have been tested to
solve recognition tasks on a limited training dataset. We have only 599 images of the five most
demanded moss species. The current implementation uses a model of the Siamese neural network with
a triplet loss function, the average accuracy of which amounts to 97.7% [4]. Forecasting is an
important stage of environmental monitoring to fill data gaps. A forecasting mechanism based on the
use of machine learning together with remote sensing data has been implemented within the platform.
The approach is not universal, but some chemical elements, such as aluminum, copper, antimony,
arsenic, chromium, nickel, iron, and vanadium, have shown good results. Images of various satellite
programs are used to obtain so-called indices, which act as additional data when training the model
and as basic data when conducting the forecast. The index includes the name of the satellite program,
the data of which is used, the size of the analyzed area, the identifier of the spectral channel (band) in
which the image is made, and the mathematical function applied to the digital matrix of the obtained
image [5]. We have used the Google Earth Engine platform to automatically calculate the indices. This
platform has advanced mechanisms for searching, processing, and analyzing satellite data. The data
presented at GEE has already been preprocessed. There are more than 100 satellite programs and
modeled datasets. Some programs have image resolution up to 15 - 30 meters. Our platform
microservices are used to collect indexes, build global and local models, select optimal parameters,
and predict contamination.




Figure 2. Concentration of Cu in summer in Belgrade: a) real measurements, and b) prediction values;
   area A represents the central part of Old Belgrade with a permanently high traffic flow; area B
                                  represents a large railway terminal

         In the current implementation, statistical models or neural networks are used depending on the
amount of training data. We focus on regression and classification tasks, but classification is
prioritized for several reasons. Firstly, model data is mainly needed to build maps in which the
gradation of pollution levels is already well known. Secondly, the determination of the accuracy
metrics is clearer and more reliable for classification tasks. Thirdly, there are fewer sampling points
that have a high level of contamination than points with a normal level, and better results can be
achieved using balancing techniques for training datasets. At present, for local and regional maps of
some elements, the accuracy of models reaches 90-95%.


                                                   427
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021



6. Conclusion
         The intelligent environmental monitoring platform provides a new level of service for UNECE
ICP Vegetation participants. The platform has some worthwhile analytics, classification and prediction
abilities based on modern technologies. The obtained results motivate several potential projects to
consider our platform as a solution to their tasks. The platform will not only enhance the current
functionality, but also provide new opportunities. One of the uppermost tasks is the automation of the
environmental monitoring process based on modeling. Gathering satellite indexes is a much faster
process than collecting and processing moss samples. We can gather new data and make predictions
several times a year. If the contamination level in some regions exceeds certain limits, the platform
will send notifications to the corresponding persons. The integration of data on PM and gaseous
pollutants from air quality monitoring stations to the platform is considered. The mapping of PM
concentrations with heavy metal concentrations can widen the analytical abilities of the platform. We
are working on mechanisms of collecting and importing data on citizens' health to the platform. It will
enable the comparison of contamination levels and human diseases in some areas. We are pursuing the
possibility of obtaining information on diseases both from the Russian Compulsory Health Insurance
Fund and from social networks using Big Data technologies.


References
[1] Cohen A., BrauerM., Burnett R., Anderson H.R., Frostad J., Estep K., Balakrishnan K.,
Brunekreef B., Dandona L., Dandona R., Feigin V., Freedman G., Hubbell B., Jobling A., Kan H.,
Knibbs L., Liu Y., Martin R., Morawska L., Pope III C.A., Shin H., Straif K., Shaddick G., Thomas
M., van Dingenen R., van Donkelaar A., Vos T., Murray C.J.L. & Forouzanfar M.H., Estimates and
25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data
from the Global Burden of Diseases Study 2015 // Lancet, 389 (2017), 1907‒1918.
[2] Harmens H., Norris D.A., Steinnes E., Kubin E., Piispanen J., Alber R., Aleksiayenak Y., Blum
O., Coskun M., Dam M., De Temmerman L., Fernández J.A., Frolova M., Frontasyeva M., González
Miqueo L., Grodzinska K., Jeran Z., Korzekwa S., Krmar M., Kvietkus K., Leblond S., Liiv, S.
Magnússon S.H., Mankovská B., Pesch R., Rühling Å., Santamaria J.M., Schröder W., Spiric Z.,
Suchara I., Thöni L., Urumov V., Yurukova L. & Zechmeister H.G., Mosses as biomonitors of
atmospheric heavy metal deposition: spatial patterns and temporal trends in Europe // Environ Pollut,
158 (2010), 3144–3156.
[3] Frontasyeva M., Kutovskiy N., Nechaevskiy A., Ososkov G., Uzhinskiy A. Cloud platform for
data management of the environmental monitoring network: UNECE ICP Vegetation case // CEUR
Workshop Proceedings, 2016, 1787, pp. 224–229
[4] Uzhinskiy AV, Ososkov GA, Goncharov PV, Nechaevskiy AV, Smetanin AA. One shot
learning with triplet loss for vegetation classification tasks // Computer Optics 2021; 45(4): 608-614.
DOI: 10.18287/2412-6179-CO-856.
[5] A. Uzhinskiy, M. Aničić Urošević, M. Frontasyeva. Prediction of air pollution by potentially
toxic elements over urban area by combining satellite imagery // Moss Biomonitoring Data and
Machine Learning. Ciencia e Tecnica Vitivinicola Journal, ISSN:2416-3953, Vol. 35, No. 12, 2020.




                                                   428