Ontological Description of Meteorological and Climate Data Collections © A.A. Bart © V.V. Churuksaeva Tomsk State University, Tomsk, Russia bart@math.tsu.ru © A.Z. Fazliev © A.I. Privezentsev © E.P. Gordov © I.G. Okladnikov © A.G.Titov Institute of Atmospheric Optics SB RAS, Tomsk, Russia faz@iao.ru remake@iao.ru gordov@scert.ru oig@scert.ru titov@scert.ru Abstract. The first version of the primitive OWL-ontology of collections of climate and meteorologi- cal data of Institute of Monitoring of Climatic and Ecological Systems SB RAS is presented. The ontology is a component of expert and decision-making support systems intended for quick search for climate and meteorological data suitable for solution of a certain class of applied problems. Keywords: ontology description of object domains, systematization of domain data, climate and me- teorological data. the collections agrees with physical parameters advised 1 Introduction by World Meteorological Organization (WMO). They Today every large meteorological center uses origi- are described in the taxonomy of the WMO ontology nal meteorological models for calculation of climate Codes Registry [19], as well as in the taxonomy of the and meteorological parameters, which can differ both in ontology of the GRIB Discipline Collection [16] in- the level of detail and set of calculated values of physi- tended for the use in the Climate Information Platform cal parameters. During the reanalysis of a meteorologi- for Copernicus (CLIPC). cal situation, key meteorological parameters corre- The ontology description of data collections in the sponding to measurements at weather stations are usual- form of a primitive (simplified) formal OWL-ontology ly taken into account. is intended for the selection of data collections within The results of climatic numerical simulation, weath- an expert system, which can be used during solution of er forecast, or reanalysis of meteorological fields are an applied task of an object domain. collections of meteorological parameters that character- The ontology approach selected for the solution of ize the state of the atmosphere. They are represented by the problem stated consists in the following. An ontolo- data arrays in common formats, e.g., grib [7], netCDF gy description is constructed for an applied problem. In [12], HDF5 [8], etc. addition to the physical statement, the description At Institute of Monitoring of Climate and Ecological should include the mathematical statement of the task, Systems SB RAS (IMCES SB RAS), the data pro- i.e., a mathematical model with equations. Variables, cessing environment [3] has been developed for repre- which conform the WMO classification, and limitations senting collections of meteorological data; the environ- are described in the form of an OWL-ontology. On the ment is provided by sets of metadata that characterize one hand, the set of parameters includes common mete- physical parameters entering into the above collections. orological parameters, such as sea level pressure, sur- The practice showed restriction of the use of only local- face pressure, air temperature and humidity, wind speed ized applications in this environment. Inclusion of ex- and direction, and so on. This allows comparison of the ternal applications resulted in creation of a new system computed values with the weather station measurement – virtual information platform “Climate+” [17], where results. On the other hand, both meteorological and cli- data are represented in the netCDF format. mate models supplemented by an applied task compose a component of a more complex model, where the re- When using climate data from different collections sults of prognostic calculations by cli- of numerous data manufacturers, the problem arises of mate/meteorological parameters are used for the solu- ambiguous identification of physical parameters from tion of applied problems in different fields of human these collections. The sense of physical parameters in activity. This, in turn, enriches collections of climate and meteorological data with values of new physical Proceedings of the XIX International Conference parameters. “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL’2017), Moscow, Russia, October 10-13, 2017 266 2 Virtual data processing environment ECMWF ERA Interim, MRI/JMA APHRODITE’s Wa- Approaches used in the creation of the prototype ter Resources Project data, DWD Global Precipitation of a subject virtual data processing environment Climatology Centre data, GMAO Modern Era- (VDPS) for the analysis, estimation, and forecast of Retrospective analysis for Research and Applications the impacts of global climate changes on the natural (MERRA), reanalysis of the joint Project «Monitoring environment and climate of a region were mainly atmospheric composition and climate (MACC)», NO- developed during the design of the “Climate” web AA-CIRES Twentieth Century Global Reanalysis, ver. GDS [4,5]. This sub-ject GDS has been designed with II, NCEP Climate Forecast System Reanalysis (CFSR), the use of up-to-date information and communication simulation results obtained with the use of global and technologies, is based on the conceptions of spatial regional climate and meteorological models. Observa- data infrastructure (SDI) [2, 10], and grounds a tion data from weather stations from the territory of the software infrastructure for the complex use of former USSR for the 20th century included in the Post- geophysical data and information sup-port of GIS database are also accessible. integrated multidisciplinary scientific researches in the Data processing modern quantitative meteorology. We have se-lected it as a subject component of VDPS for Earth sci-ences. 1. Statistical characteristics of meteorological parame- A web geoportal [1, 9] is a single access point to ters: sample mean, variance, excess, median, mini- subject spatial data, processing procedures and mum and maximum, and asymmetry. results [1, 9]. The portal allows a user to search for 2. Derived climate parameters: vegetation period dura- geoinfor-mation resources in metadata catalogues, to tion, sum of effective temperature, Selyaninov hy- form sam-ples of spatial data according to their drothermal coefficient. characteristics (access functionality), and to manage 3. Periodic variations: standard deviation, norms, aber- tools and applica-tions for data processing and rations, amplitudes of diurnal and annual variations. mapping. 4. Non-periodic variations: duration and repeatability of atmospheric phenomena with meteorological pa- The GDS Web Client [6, 13] is the main tool of rameters below or above the limits specified at dif- the user’s desktop. It ensures the fulfillment of ferent time points. OGC re-quirements for web services: spatial data visualization (Web Map Service—WMS), data Then a user can either analyze the results or contin- representation in vec-tor (Web Feature Service— ue adding new layers on the map. To study the results, WFS) and bitmap formats (Web Coverage Service— the user is provided for a possibility of selecting a geo- WCS), and their geospatial processing. It provides graphical region, scaling, getting values from all layers for the access to collections of climate data and tools at a point, additionally processing earlier results (e.g., for their analysis and visualiza-tion of the results comparison between data from different layers). In ad- via typical GDS graphical web browser. The Web dition to the direct analysis of geophysical data, a used Client satisfies the general require-ments of INSPIRE can carry out joint researches with other user, share the standards and allows selection of data set, results, and use proper data collections in the pro- processing type, geographic region for the analysis cessing. In general, this hardware-software complex of processes, and representation of the pro-cessing provides for distributed access, processing and visuali- results of spatial data sets in the form of WMS/ zation of large collections of geospatial data with the WFS map layers in bitmap (PNG, JPG, Geo-TIFF), use of cloud technologies. vector (KML, GML, Shape), and binary formats The data processing environment “Climate” devel- (NetCDF). oped at IMCES SB RAS limits possibilities of users by Today, the VDPS prototype combines data local software applications. A current task is to extend collec-tions (reanalyses and climate simulation the environment by external user applications. For this, results and weather station measurements) within the the corresponding problems should be specified in gen- unified geo-portal, supports the statistical analysis of eral. Below we describe one of possible classes of prob- archive and required data, and provides access to lems connected with decision-making. the WRF and «Planet Simulator» models. In particular, a user can run a VDPS-integrated model, 4 General definition of the problem preprocess the results, pro-cess them numerically and The “Climate+” virtual information platform in- analyze, and gain the results in graphical cludes collections of meteorological and climate data. It representation. The prototype provides for specialists is intended for the data representation with the use of that participate in a multidisciplinary re-search GIS technologies. Its further development is oriented to process prompt tools for integral study of climate and providing researchers possibilities of using selected data ecological systems on the global and regional sets or their parts as input data. Most collections include scales. With these tools, a user that does not know pro- data related to some (not all) spatiotemporal objects of gramming is able of processing and graphically repre- the Earth; different collections often include different senting multidimensional observation and simulation sets of physical parameters. To search for required spa- data in the unified interface with the use of the tiotemporal objects and their meteorological and clime web browser. characteristics, it was necessary to create a 3 VDPS prototype capabilities corresponding expert system on the basis of a knowledge base on spatial objects of the data Support of the following data sets is built in the pro- collections and their parameters. totype: NCEP/NCAR reanalysis, ed. II, JMA/ CRIEPI JRA-25 reanalysis, ECMWF ERA-40 reanalysis, 267 Figure 1 Simplified block-diagram of “Climate +” platform modification Figure 1 shows a simplified block-diagram which is tem on the basis of both meteorological and climate a basis of the “Climate +” platform modification. There data, the parameters should be matched. Therefore, the are three groups of subsystems: meteorological and WMO classification in version [11] is included in the climate data collections; subsystem for work with ontology. This matching allows describing applied tasks knowledge bases (expert system for selecting input data of the domain in common terms. for applied tasks and decision-making support system), There are climatic and meteorological resources and applied tasks with their input and output data. The [16, 19] that use the WMO classification of names of data representation services are omitted. meteoparameters for the GRIB format for data storage In this work, we discuss questions of creation of a [7]. First of all, WMO Codes Registry created for the knowledge base for the expert system. The main prob- aviation with the aim of supporting data exchange in the lem which has been solved is substantiation of the re- AvXML format; it is based on RDF and SKOS recom- duction problem solution [20] or, in other words, con- mendation. struction of typical individuals of an OWL-ontology In our OWL-ontology of climate information re- that characterize properties of spatiotemporal objects sources, we created classes and individuals that corre- from the collections. The development of the conceptu- spond to names of meteorological parameters, e.g., the al part of the ontology (T- and R-box) is connected in Meteorological_Products class and subclasses, accord- our solution with classification of meteorological and ing to [11]. In the primitive OWL-ontology of climate climate parameters and is briefly described below. information resources described below, classes and in- 5 Taxonomy of meteorological parameters dividuals are created that correspond to names of mete- orological parameters according to [11]. Individuals that The OWL DL language [14] is used for the ontology unambiguously characterize physical parameters by description of the domain that generalizes, in particular, their name [11] have been created in each subclass related spatiotemporal objects. These objects can be an Thermodynamic_Stability_category, Atmospher- air layer over a bounded territory, upper soil layer on ic_Chemical_Constituents_category, Electrodynam- this territory, or, in more specific cases, forests, fields, ics_category, Mass_category, Long- or long roads. There are physical and chemical process- wave_radiation_category, Temperature_category, es connected with the objects; they are described by Short-wave_radiation_category, Aerosols_category, numerical models and used in calculations. Input values Moisture_category, Radiology_Imagery_category, of the physical parameters are required for the calcula- Momentum_category, Trace_Gases_category, tions. The processes under study can relate to different Cloud_category, and Physical_Atmospheric_category. temporal and spatial scales and be described on differ- For the INMCM4 collection, which corresponds to ent levels of detail. Let us note that coupling of several output data of the INMCM4 climate model of general mathematical models requires knowledge of sets of atmospheric and ocean circulation [18], classes and input and output parameters and their spatiotemporal subclasses were created corresponding to model varia- characteristics. bles. These classes agree to the corresponding WMO The taxonomy of physical parameters allows form- classes. ing sets of properties of spatiotemporal objects of a do- main for solution of specific applied tasks. This taxon- 6 Primitive ontology of “Climate+” plat- omy is used in the OWL-ontology for T-box construc- form data tion. The OWL DL developed and formalized ontology When developing the decision-making support sys- of climate information resources describes the current 268 state of collections of data arrays of the data processing measurement units (Unit). They can be described by: environment as one of the main Russian information the number of members of the array of a physical pa- resources on climate data. Numerical data are repre- rameter (has_number_of_values), its minimal value sented by data arrays that are stored in netCDF files. (has_minimum_value) and maximal value The data arrays are grouped in data sets. All data arrays (has_maximum_value), or by numerical values of the in a set should: (a) be received at one temporal or spa- parameter (has_value). A data array (Data_array) is an tial grid; (b) cover the same time interval; (c) be re- ordered list of numerical values of a physical parameter ceived under the same simulation or observation condi- (Physical_quantity), as a property of the spatiotemporal tions (if possible); (d) be represented by a set of netCDF system (has_spatiotemporal_system), at each 4D point files, which include the same physical parameters. The (longitude, latitude, height level, and time) of the spa- data sets are grouped in data collections. A data collec- tiotemporal system (Spatiotemporal_system). In the tion is an ensemble of data sets received by an organiza- OWL-ontology, a data array (Data_array) is a subclass tion within a project, but represented on different spatial of the class Physical_quantity_values and, hence, is a or temporal grids or for different model scenarios. In numerical array of values of one physical parameter particular, a collection can consists of the only data set. (Physical_quantity) in certain measurement units (Unit); The basic classes in the OWL-ontology are: Collec- it is described by the number of members tion, Spatiotemporal_object, Organization, Data_set, (has_number_of_values), maximal values Data_array, Scenario, Spatial_resolution, Physi- (has_minimum_value) and minimal values cal_quantity, Physical_quantity_values, Unit, Longi- (has_maximum_value) of the physical parameter. A tudes_array, Time_step, Latitudes_array, data array (Data_array) belongs (has_data_array) to a Height_levels_array, and Times_array. The spatiotem- data set (Data_set), which differs from other data sets poral system is a four-dimensional object determined by by the model scenario (Scenario), spatial resolution arrays of numerical values of longitudes (Longi- (Spatial_resolution), time step (Time_step), and belong- tudes_array), latitudes (Latitudes_array), height levels ing to one collection (Collection). A data collection (Height_levels_array), and time labels (Times_array), (Collection) consists of (has_data_set) data sets (Da- which are subclasses of the class of the list of values of ta_set) and belongs (has_organization) to one organiza- a physical parameter and, therefore, numerical arrays of tion (Organization). The OWL properties of the climate one physical parameter (Physical_quantity) in certain data ontology are represented in Tables 1 and 2. Table 1 Object properties of the ontology of climate information resources Domain Object Property Range id Collection has_organization Organization o01 Collection has_data_set Data_set o02 Data_set has_scenario Scenario o03 Data_set has_spatial_resolution Spatial_resolution o04 Data_set has_time_step Time_step o05 Data_set has_data_array Data_array o06 Physical_quantity_values has_physical_quantity Physical_quantity o07 Physical_quantity_values has_unit Unit o08 Data_array has_spatiotemporal_object Spatiotemporal_object o09 Spatiotemporal_object has_longitudes_array Longitudes_array o10 Spatiotemporal_object has_latitudes_array Latitudes_array o11 Spatiotemporal_object has_height_levels_array Height_levels_array o12 Spatiotemporal_object hat_times_array Times_array o13 Definitions of object properties are given in first are given in the first three rows of Table 2; unique iden- three rows of Table 1; their unique identifying proper- tifying properties are given in the fourth row. The range ties, in the fourth row; the range of definition (the first of definition (the first row) and range of values (the row) and range of values (the third row) are specified third row) are specified for each property from the sec- for each property. Definition of the data array properties ond row. Table 2 Data type properties in the ontology of climate information resources Domain Datatype Property Range id Physical_quantity_values has_number_of_values int d01 Physical_quantity_values has_minimum_value float d02 Physical_quantity_values has_maximum_value float d03 Physical_quantity_values has_value float d04 Times_array has_time_start str d05 Times_array has_time_end str d06 269 Figure 2 Simplified representation of individual describing ERAInt data collections 270 Figure 2 exemplifies a simplified individual of the teorological collections of IMCES SB RAS is con- OWL-ontology of climate information resources, structed; it can be used for the search and selection of used in the description of a ERAInt data collection, data for classes of applied problems in coupled deci- within the formal description of RDF resources [15]. sion support systems. The matching of physical pa- Individuals of the OWL-ontology are shown in rameters of applied tasks with IMCES SB RAS col- ovals; literal values are given in rectangles; the ar- lections is carried out in WMO accepted terms. rows show properties with unique identifiers in small 8 Acknowledgment rectangles, taken from Tables 1 and 2. Three arrows mean probable property cardinality higher than unity. The authors thank the Russian Science Foundation Three overlapped ovals mean probable number of for the support of this work (developing of web- individuals of the OWL-ontology larger than unity. services and solution of reduction problems) under The individual “Data_collection” is connected by the the grant No16-19-10257. We also thank Russian property “has_data_set” with the individuals “Da- Foundation for Basic Research (16-07-01028) for the ta_set”, each of which is connected by the property support of work (conceptualization of domains) “has_data_array” with individuals ”Data_array”. partially described in the sections 4, 5, 6 of the article. The domain analysis of climate numerical data ar- References rays of the “Climate+” platform, stored as NetCDF files, allows the description of a primitive ontology of [1] Becirspahic, L., Karabegovic, A.: Web Portals climate data of this platform in the OWL DL lan- for Visualizing and Searching Spatial Data. In- guage. The primitive ontology is a simple and easily form. Comm. Techn., Electr. and Microelectr. extended systematization of information resources (MIPRO), 2015, 38-th International Convention required for the further work on the development of on, Opatija, pp. 305-311 (2015). doi: the decision-making support system. 10.1109/MIPRO.2015.7160284 To construct the climate data ontology of the [2] Frans, J. M., van der Wel: Spatial Data Infra- “Climate+” platform the software has been developed struc-Ture for Meteorological and Climatic Da- for the formation of the fact-based block (A-box). An ta. Meteorol. Appls., 12 (1), pp. 7-8 (2005) A-box has been formed for the climate data ontology [3] Gordov, E. P., Okladnikov, I. G., Titov, A. G.: using this software. Facts have been retrieved from Application of Web Mapping Technologies for the analysis of 80 Tb of climate data from the “Cli- Development of Information-Computational mate+” platform over 13 numerical data collections, Systems for Georeferenced Data Analysis, which include 36 data sets and 793 data arrays. All Vestnik NGU, Ser. Information Technologies, 9 the climate data collections include description of 170 (4), pp. 94-102 (2011) (in Russian) spatiotemporal systems and 156 physical parameters [4] Gordov, E.P., Lykosov, V.N., Krupchat- that characterize properties of these systems. nikov, V. N., Okladnikov, I. G., Titov, A. G., 7 Conclusions Shulgina, T. M.: Computational-information Technologies for Monitoring and Modeling of The prototype of subject virtual data processing Climate Change and its Consequences. Novosi- environment has been developed to provide for re- birsk: Nauka, 199 p. (2013) (in Russian) searchers, specialists, and people that make decisions an access to different geographically distributed and [5] Gordov, E. P., Okladnikov, I. G., Titov, A. G. : georeferenced resources and climate data processing Information and Computing Web-system for In- services via a typical web browser. It includes a geo- teractive Analysis of Georeferenced Climatic portal, systems for distributed storage, processing, Data Sets, Vestnik NGU, Ser. Information and providing of spatial data and results of their pro- Technologies, 14 (1), pp. 13-22 (2016) (in Rus- cessing. In particular, it allows the simultaneous sian) analysis of several subject sets of climate data with [6] Gordov, E., Shiklomanov, A., Okladnikov, I., the use of up-to-date statistical methods and, thus, Prusevich, A., Titov, A.: Development of Dis- revealing the impacts of climate changes on ecologi- tributed Research Center for analysis of regional cal processes and human activity. After finishing the climatic and environmental changes, IOP Conf. work on the prototype, different interactive web tools Series: Earth and Environmental Science, 48, are to be developed for the profound analysis of cli- 012033 (2016) matic variables and their derivatives provided by the [7] Guide to the WMO Table Driven Code Form subject geoportal. Used for the Representation and Exchange of The developed software is used for processing Regularly Spaced Data In Binary Form: FM 92 spatial datasets, including observation and reanalysis GRIB Edition 2. World Meteorological Organi- data, for the spatiotemporal analysis of recent and zation Extranet. 2003. URL: probable climate changes, with the special focus on http://www.wmo.int/pages/prog/www/WMOCo extreme climate phenomena in northern latitudes. des/Guides/GRIB/GRIB2_062006.pdf The primitive OWL-ontology of climate and me- 271 [8] HDF Group - HDF5: https://support.hdfgroup.org/HDF5/ [9] Koshkarev, A. V.: Geoportal as a Tool to Con- trol Geospatial Data and Services, Geospatial Data, 2, pp. 6-14 (2008) (in Russian) [10] Koshkarev, A. V., Ryakhovskii, A. V., Serebry- akov, V. A.: Infrastructure of Distributed Envi- ronment of Spatial Data Storage, Search and Processing, Open Education, 5, pp. 61-73 (2010) (in Russian) [11] NCEP/NCO Production Management Branch. NCEP WMO GRIB2 Documentation. National Weather Service Organization NCEP Central Operations. 2005. http://www.nco.ncep.noaa. gov/ pmb/docs/grib2/grib2_doc.shtml [12] Network Common Data Form (NetCDF). https://www.unidata.ucar.edu/software/netcdf/ [13] Okladnikov, I. G., Gordov, E. P., Titov, A. G.: Development of Climate Data Storage and Pro- cessing Model. IOP Conf. Series: Earth and En- vironmental Science, 48, 012030 (2016) [14] OWL 2 Web Ontology Language. RDF-Based Semantics (Second Edition), Eds: M. Schneider, F. J. Carroll, I. Herman, P. F. Patel-Schneider. W3C Recommendation 11 December 2012, http://www.w3.org/TR/2012/ REC-owl2-rdf- based-semantics-20121211/ [15] Resource Description Framework (RDF): Con- cepts and Abstract Syntax, W3C Recommenda- tion 10 February 2004, Eds: Graham Klyne, Jer- emy J. Carroll, http://www.w3.org/TR/2004/ REC-rdf-concepts-20040210/ [16] The GRIB Discipline Collection: [site] (2004). URL: http://vocab-test.ceda.ac.uk/collection/ grib/Discipline [17] Titov, A. G., Gordov, E. P., Okladnikov, I. G.: Hardware-Software Platform «CLIMATE» as a Basis for Local Spacial Data Infrastructure Ge- oportal, Vestnik NGU, Ser. Information Tech- nologies, 10 (4), pp. 104-111 (2012) (in Rus- sian) [18] Volodin, E. M., Dianskii, N. A., Gusev, A. V.: Simulating Present-day Climate with the INMCM 4.0 Coupled Model of the Atmospheric and Oceanic General Circulations, Izvestiya, Atmospheric and Oceanic Physics, 46 (4), pp. 414-431 (2010) [19] WMO Codes Registry (2013). URL: http://codes.wmo.int/grib2 [20] Zinov'ev, A. A.: Foundations of the Logical Theory of Scientific Knowledge (Complex Log- ic), D. Reidel Publishing Company, 264 p. 272