=Paper= {{Paper |id=Vol-2523/paper12 |storemode=property |title= Applied Ontologies for Managing Graphic Resources in Spectroscopy |pdfUrl=https://ceur-ws.org/Vol-2523/paper12.pdf |volume=Vol-2523 |authors=Nikolai Lavrentiev,Alexey Privezentsev,Alexander Fazliev |dblpUrl=https://dblp.org/rec/conf/rcdl/LavrentievPF19 }} == Applied Ontologies for Managing Graphic Resources in Spectroscopy == https://ceur-ws.org/Vol-2523/paper12.pdf
     Applied Ontologies for Managing Graphic Resources
                      in Spectroscopy

                       Nikolai Lavrentev, Alexey Privezentsev, and
                            Alexander Fazliev[0000-0003-2625-3156]
               Institute of Atmospheric Optics SB, RAS, Tomsk 634055, Russia
                   lnick@iao.ru, remake@iao.ru, faz@iao.ru

        Abstract The report presents the tasks on graphical resources management
        thoroughly describing applied ontologies of GrafOnto research graphics collec-
        tion used for solving problems of spectroscopy. The problems of ontology mod-
        ularity and automatic classes` generation are being discussed. Examples of solv-
        ing reduction problem as well as applied ontologies metrics are presented.

        Keywords Research Graphical Resources Classification, Spectroscopic Graph-
        ical Resources Ontology


1       Introduction

In the middle of 2000s the emergence of digital scientific libraries with publications
as well as Semantic Web approach oriented on semantic description of information
resources induced the work on decomposition of resources into smaller parts that
require the creation of semantic annotations oriented on the description of domains
and various data representations used in them. Various forms of data representation
are always used in scientific publications (text, tables, graphics, symbols (for exam-
ple, formulas), etc …). On the other side researcher got the facilities for storing and
presenting large amounts of information, although published data and information
was needed for the control of this information quality. Virtual data centers in various
domains appeared in the second half of the 2000-th. These data centers usually con-
tained the published data represented in publications in tabular form. In the end of
2000s publications on scientific graphical resources’ systematization started to appear
Ref. [1–4]. An example of an approach to creating a collection of graphical resources
in High Energy Physics is presented in Ref. [5].
   The report presents the results of the final stage of scientific plots’ systematization
in three disciplines of spectroscopy. At the first stage we formed GrafOnto collection
of graphical resources [6–10] describing the results of studies on the problems of a
water molecule spectral lines’ continuum and on spectral properties of weakly bound-
ed complexes and absorption cross-sections used for the photochemical reactions
rates’ calculation. At the second stage the typification of plots and figures as well as
the first version of GrafOnto resources ontology was done (see Ref. [11–13]).
   In order to upload new datasets into GrafOnto system and support them one has to
solve the tasks on managing graphical resources. These are such tasks as specification
of informational resources’ structure for spectroscopy problems and analysis of re-


    Copyright © 2019 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).




                                             107
sources’ validity, control of data completeness and trust estimation. The decision
support system which used in management of the collection GrafOnto is based on
ontologies describing the primitive and composite plots and figures. Description of
these ontologies is the aim of this report.


2        GrafOnto Collection of Scientific Graphics

The collection is based on a digital library, containing more than a thousand articles.
These articles are dedicated to spectroscopy research such as spectral lines’ continu-
um, weakly bound complexes’ properties and spectral functions in near and far ultra-
violet range. A distinctive feature of the above problems of spectroscopy is that the
major part of published data is represented in a form of plots, figures and images.
   In order to create a collection, graphical objects should be manually extracted and
converted into a digital form. Software used to upload, storage, view, search and inte-
grate graphical resources into collection is original. At present, the collection contains
about 3000 primitive plots included into 625 composite plots and 104 composite fig-
ures as well as about 4000 primitive plots ready for the upload. The uploaded plots
describe properties of 19 molecules, 25 complexes and 50 mixtures. Almost a half of
primitive plots characterize properties of a water molecule. Collections’ plots are
related to dozens of physical quantities (functions) and a dozen of physical quantities
(arguments). Table 1 illustrates spectral lines’ collections and a number of primitive
plots related to these functions for substance groups. It is worth noting that, at present,
only a part of the plots from the publication chosen by experts is uploaded into the
collection. Other plots will be processed automatically after the software for machine
processing of graphical resources is developed. The collection of plots that has al-
ready been created will be used as a data set for training a neural network aimed at
automatic recognition of scientific graphics.
    Table 1. Number of plots with the functions that are most frequently used in the collection

                  Functions                            The number of the primitive plots
                                          Mixture         Molecule         Complex         Total
                              -
 Absorption Coefficient (cm ¹)                 20              103              17           140
 Absorption Coefficient (cm-¹atm-¹)            14              38                            52
 Absorption Coefficient (cm²mol-¹atm-¹)        95              567              27           689
 Absorption Coefficient (dB/km)                14              128              11           153



3        Managing Graphical Resources in GrafOnto Collection

The principal tasks of graphical resources management are to control resources struc-
ture and data quality. An ontology knowledge base accumulating all computer-
generated information on collection components is used for making decisions during
the management.




                                               108
   Resources structure contains plots of various types, their description, substances,
functions and their arguments, physical quantities’ units, units table as well as coordi-
nate systems and level of detail of their description, etc. Control of plots and figures
validity is based on the analysis of calculated values of paired relations between cited
plots and original plots related to them. Such a relation is characterized by a reference
to publication, figure number and an identifier of a curve. Note that, at present, the
collection of cited plots contains 693 primitive and 248 composite plots. The ontology
describing the present state of the collection resources is presented below.


4      Applied Ontologies of Scientific Plots and Figures
       in Spectroscopy

Taxonomy of some of the most important artifacts of research publishing [5] includes
concepts: figure (composite figure, plot (exclusion area plot, GenericFunctionPlot,
histogram), diagram, picture. In our work we defined additional concepts character-
ized by the methods of acquiring physical quantities (FTP, Cell, etc …) as well as
their types (Theoretical, Experimental, Fitting, Asymptotic), slang names of physical
quantities, etc. and declared them as subclasses o GenericFunctionPlot class. These
definitions are oriented on physical quantities used as plots’ axes.
   We defined the following hierarchy for forming ontologies in spectroscopy do-
main. Basic ontology of spectroscopy graphical resources contains three parts and
each part is related to one of the three problems of spectroscopy. These problems are
the following: problems of continuum absorption, weakly bound complexes as well as
the specific task of spectral functions related to photochemical reactions in the atmos-
phere.


4.1    Basic Ontology and Applied Ontologies of Domain Problems

Basic ontology contains some classes and properties, which are used in applied ontol-
ogies of domain problems. In our case, these problems are weakly related to each
other and are represented by the following independent modules: graphical resources
of continuum absorption, weakly bound complexes and absorption profiles, defining
rate of photochemical reaction. Each of these modules is split into three parts: the first
part characterizes coordinate systems used in GrafOnto collection, the second one
characterizes physical quantities, while the third one characterizes the substances, the
properties of which are presented in the collection.


4.2    Main Classes

In ontologies classes define many resources presented in our work in a form of plots
and figures from GrafOnto collection as well as in a form of description of their prop-
erties. All the classes are explicitly defined in OWL 2 syntax with the use of Man-
chester syntax for their definition.




                                           109
Basic ontology classes
   In the framework of the chosen model the main entities in spectroscopy are sub-
stances (Substance class) – molecules as well as complexes and mixtures, and meth-
ods of acquiring physical quantities’ values (Method). Graphical representation is
related to graphical system entity (GraphicalSystem). The components of a graphical
system are, for example, the coordinate axes of plots representing physical entities. In
GrafOnto each published plot or figure is related to the description of its properties
(Description, ResearchPlotDescription classes). One of such properties is a biblio-
graphic reference to a publication (Reference). The Problem class contains three
individuals (Continuum, Complex and CrossSection) each identifying a problem re-
lated to a graphical resource.
Classes related to domain problems’ ontologies
   Domain problems are closely related to the tasks for their solution. GrafOnto col-
lection contains graphical resources related to the problems mentioned in the intro-
ductory abstract of this paragraph. Classes of spectroscopy problems’ ontologies con-
tain numerous resources and their description. PhysicalQuantity class consists of two
non-adjacent subclasses named SystemPhysicalQuantityDepended and System-
PhysicalQuantityIndepended. The first class contains physical quantities the de-
pendency of which on other physical quantities is presented in plots and figures, while
the second one contains physical quantities the dependency of which is presented in
plots and figures.
   In order to understand the names of ontology classes we have to describe the ety-
mology first. A name may consist of several words. These words correspond to the
names of individuals in the corresponding classes MethodType, SystemPhysi-
calQuantityDepended and Substance. Fig. 1 presents examples of schemes for cre-
ating subclasses names in A classes (Physical quantity and related substances) and
B classes (Substance and related physical quantities) presented in simplified syn-
tax.




                  Fig. 1. Word order in the names of A and B groups’ classes
   For example, a class named Description_Experimental_Absorption
_Coefficient__cm2mol_1atm_1_ contains all the descriptions of measured absorp-
tion coefficients with cm2molecule-1atm-1 dimension for a series of substances being a
subclass of Physical quantity and related substances. The third group of classes
related to subclasses of GraphicalSystem class is not presented in this work.




                                           110
4.3    Main Properties
Comments for all the properties used in natural language are presented in OWL 2
ontologies code. Here we present a simplified classification of some properties related
to physical quantities and descriptions of plots and figures. Description of properties
related to Description and CoordinateSystem classes as well as to Temperature and
Pressure quantitative characteristics is omitted.
   Table 2 lists ontology properties defining their domains and ranges. The last col-
umn of the table shows abbreviations of properties used in the scheme of individual
presented in Fig. 2.
   Qualitative properties characterized physical quantities are hasOriginType, has-
SourceType and hasMethodType. The values of hasOriginType property indicate the
origin of dataset related to the plot: it should be original and should be obtained by
digitizing the curve of a primitive plot. The values of hasSourceType property can
describe primary data, i.e. the data obtained by the authors of the publication as well
as the previously published curves (i.e. cited) and commonly known curves (i.e. ex-
pert). The values of hasMethodType property characterize qualitative acquisition of
datasets of primitive plot: Theoretical is a calculation using physical or mathematical
model, Experimental is measurement, Fitting is a continuous curve creation using the
method of fitting to experimental values.
The relations between plots and figures are defined by 6 properties (has{OPPD,
CPPD, OCPD, CCPD, MCPD}, hasPrototype) . First five mereological properties
describe composition of composite plots (OPPD, CPPD) and figures (OCPD, CCPD,
MCPD). The value of hasPrototype property used in the description of cited primitive
plot is the corresponding original plot. This property defines the descriptions that
contain datasets with closely related values.
4.4    Main Types of Individuals
Being equivalents of figures and plots from published graphical resources on the
above problems images generated in GrafOnto system are related to the description of
their metadata making the most significant part of ontology individuals included in A-
box. Typification of figures and plots given in Ref. [14] is defined by the property
values. Abbreviation of corresponding values is used in the names of such individuals
(for example, OCP – Original Composite Plot). Fig. 2 illustrates the structure of one
of such plot types, i.e. original primitive plot. Ovals stand for ontology individuals,
rectangles stand for literals and directed arcs stand for objective (OP) and determined
(datatype – DTP) properties. Cited primitive plot have a similar structure with an
addition of observations with hasPrototype, hasChild and hasParent properties. Spe-
cial cases of individuals characterizing properties of coordinate system and its axes
are shown in the lower part of Fig. 2. A series of individuals are related to the classes
defined by enumeration of its individuals.




                                          111
         Table 2. Main properties of base ontology and spectroscopy problems ontology

Primitive Plot Description (PPD)
Domain                             Property                     Range                        Abbr
Description                        hasReference                 Reference                    OP1
PrimitivePlotDescription           hasSubstance                 Substance                    OP2
PrimitivePlotDescription           hasSourceType                {Primary, Expert, Cited}     OP3
PrimitivePlotDescription           hasOriginType                {Digitized, Original}        OP4
PrimitivePlotDescription           hasCurveType                 {Line, Point}                OP5
PrimitivePlotDescription           hasCS                        CoordinateSystem             OP6
СitedPrimitivePlotDescription      hasCitedReference            Reference                    OP7
CoordinateSystem                   hasCSType                    {2D-Decartes}                OP8
CoordinateSystem                   hasX-axis                    X-axis                       OP9
CoordinateSystem                   hasY-axis                    Y-axis                       OP10
Y-axis                             hasMethod                    Method                       OP11
Y-axis                             hasMethodType                {Theory,       Experiment,   OP12
                                                                Fitting}
Y-axis                             hasPY-axis                   PubPhysQuanDepended          OP13
Y-axis                             hasSY-axis                   SysPhysQuanDepended          OP14
X-axis or Y-axis                   hasAxisScale                 {Linear, Logarithmic}        OP15
X-axis                             hasPX-axis                   PubPhysQuanIndep             OP16
X-axis                             hasSX-axis                   SysPhysQuanIndep             OP17
CitedPrimitivePlotDescription      hasPrototype                 OriginalPrimitive-           OP18
                                                                PlotDescription
PrimitivePlotDescription           hasTemperature               float                        DT1
PrimitivePlotDescription           hasPressure                  float                        DT2
PrimitivePlotDescription           hasSystemFigureNumber        integer                      DT3
ResearchFigureDescription          hasOriginalImageOfPlot       URI                          DT4
ResearchFigureDescription          hasOriginalPlotInformation   URI                          DT5
FigureDescription                  hasFigureCaption             string                       DT6
FigureDescription                  isPartOfFigureNumber         integer                      DT7
ResearchFigureDescription          hasNumberOf Points           integer                      DT9
ResearchFigureDescription          hasPlotCaption               string                       DT12
Original Composite Plot Description (OCPD)
OriginalCompositePlotDescrip-      hasOPPD          OriginalPrimitivePlotDescription         Op19
tion
Cited Composite Plot Description (CCPD)
CitedCompositePlotDescription      hasCPPD          PrimitivePlotDescription                 Op20
Composite Figure Description (CFD)
CompositeFigureDescription         hasOCPD          OriginalCompositePlotDescription         Op21
CompositeFigureDescription         hasCCPD          CitedCompositePlotDescription            Op22
CompositeFigureDescription         hasMCPD          MultipaperCompositePlotDescription       Op23




                                                  112
Fig. 2. Structure of individual characterizing original primitive plot from Fig. 4 in Ref [14]




                                             113
4.5 Ontologies Metrics

Ontologies metrics are used for comparing ontologies of different parts of a domain or
of different domains, characterizing quantitative and qualitative peculiarities of onto-
logical description. In OWL ontologies the number of object properties characterizes
the number of paired relations between individuals. Some of these individuals may have
quantitative estimation. The estimated relations are described by certain (datatype)
properties.
   Table 3 contains metrics for applied ontologies of three spectroscopy problems as
well as the unification of these ontologies (Σ Ontology). As for GrafOnto resources
collection the equality of numbers characterizing the number of properties, their do-
mains and ranges means that they are characterized by identical properties. However,
the difference in classes` numbers indicates the use of a greater number of spectral
functions in Continuum problem in comparison with Complex and Cross Section prob-
lems. As individuals of one and the same group of types are used applied ontologies we
may conclude that ontology on Continuum problem describe the highest number of
primitive plots.
   Table 3. Ontology metrics, characterizing the spectroscopy graphical resources collection

                                  Continuum        Complex       Cross Section   Σ Ontology
                                                Metrics
  Axiom                            53279              12737         10704          84662
  Logical axiom count              43180              10171          8634          68587
  Declaration axiom count           8187              2134           1715          13135
  Class count                       295                205            55            552
  Object properties count            24                24             24             24
  Datatype properties count          14                14             14             14
  Individual count                  7910              1984           1641          12578
  DL expressivity                 ALCO(D)          ALCO(D)        ALCO(D)        ALCO(D)
                                       Object property axioms
  Object properties domains          24                24             24             24
  Object properties ranges           24                24             24             24
                                      Datatype property axioms
  Datatype properties domains        14                14             14             14
  Datatype properties ranges         14                14             14             14
                                           Individual axioms
  Class assertion                   426                204           206            854
  Object properties assertion      27971              6231           5590          44152
  Datatype properties assertion    14204              3314           2683          22516
                                           Annotation axioms
  Annotation assertions             1912               432           355            2940




                                                114
   Metrics comparison of Σ Ontology with ontologies of tabular information resources
Ref. [14] reveals that in our work on graphical resources ontology we managed to sig-
nificantly increase the number of classes in one year. It clearly indicates that Σ Ontology
contains the highest number of obvious answers on typical user requests.


5   Conclusion

The report presents applied ontologies of scientific plots and figures used for managing
graphical resources in three problems of spectroscopy. Ontologies describe a collection
of plots and figures published in the period from 1918 till 2018. Ontologies are created
for managing structure of collection resources as well as for making decisions on such
tasks as development, storage and systematization of plots and figures for solving such
problems as continuum absorption and research of properties of weakly related com-
plexes and cross sections absorption. Ontology as well as its individuals and classes are
automatically generated with the enlargement of the collection.
   The future of GrafOnto collection is related to automatic recognition of plots and
figures used in spectroscopy as well as to the generation of applied ontologies charac-
terizing validity analysis and confidence estimation of its resources.


References
 1. Halpin, H. and Presutti, V.: An ontology of resources for linked data. Linked Data on the
    Web 2009, Madrid, Spain. ACM 978-1-60558-487-4/09/04
 2. Thorsen, H. and Pattuelli, C.M.: Ontologies in the time of linked data. In Smiraglia, Richard
    P., ed. Proceedings from North American Symposium on Knowledge Organization 5, 1–15
    (2015).
 3. Niknam, M. and Kemke, C., Modeling shapes and graphics concepts in an ontology.
    https://pdfs.semanticscholar.org/c20b/3b819ce253715bbfa9c2151a10ea87f718e4.pdf
 4. Kalogerakis, E., Christodoulakis, S., and Moumoutzis, N.: coupling ontologies with
    graphics content for knowledge driven visualization. https://people.cs.umass.edu/~kalo/pa-
    pers/graphicsOntologies/graphicsOntologies.pdf
 5. Praczyk, P.A.: Management of scientific images: an approach to the extraction, annotation
    and retrieval of figures in the field of High Energy Physics. Thesis Doctoral, Universidad de
    Zaragoza (2013). ISSN 2254-7606
 6. Voronina, Yu.V., Lavrentiev, N.A., Privezentzev, A.I., and Fazliev, A.Z: Collection of pub-
    lished plots on water vapor absorption cross sections. Proc. SPIE 10833 (2018). doi:
    10.1117/12.2504586s
 7. Lavrentiev, N.A., Rodimova, O.B., Fazliev, A.Z., and Vigasin, A.A.: Systematization of
    published research plots in spectroscopy of weakly bounded complexes of molecular oxygen
    and nitrogen. Proc. SPIE 10833 (2018). doi: 10.1117/12.2504327
 8. Lavrentiev, N.A., Rodimova, O.B., and Fazliev, A.Z.: Systematization of published scien-
    tific graphics characterizing the water vapor continuum absorption: I. Publications of 1898–
    1980. Proc. SPIE 10833 (2018). doi: 10.1117/12.2504325
 9. Lavrentiev, N.A., Rodimova, O.B., Fazliev, A.Z., and Vigasin A.A.: Systematization of pub-
    lished research graphics characterizing weakly bound molecular complexes with carbon di-
    oxide. Proc. SPIE 104660E (2017). doi: 10.1117/12.2289932




                                              115
10. Lavrentiev, N.A., Rodimova, O.B., and Fazliev, A.Z.: Systematization of graphically plotted
    published spectral functions of weakly bound water complexes. Proc. SPIE 10035
    (2016). doi: 10.1117/12.2249159
11. Lavrentiev, N.A., Privezentsev, A.I., and Fazliev, A.Z.: Tabular and Graphic Resources in
    Quantitative Spectroscopy. In: L. Kalinichenko et al. (eds.) DAMDID/RCDL 2018, CCIS
12. Lavrentiev, N.A., Privezentsev, A.I., and Fazliev, A.Z.: Systematization of Tabular and
    Graphical Resources in Quantitative Spectroscopy. CEUR Workshop Proceedings, Selected
    Papers of the XX International Conference on Data Analytics and Management in Data In-
    tensive Domains. Edited by Leonid Kalinichenko, Yannis Manolopoulos, Sergey Stupnikov,
    Nikolay Skvortsov, Vladimir Sukhomlin 2277, 25–32 (2018).
13. Lavrentiev, N.A., Privezentsev, A.I., and Fazliev, A.Z.: Applied Ontology of Molecule
    Spectroscopy Scientific Plots. Proc. of Conference "Knowledge, Ontologies, Theories",
    DigitPro 2, 36–40 (2017).
14. Odintsova, T.A., Tretyakov, M.Yu., Pirali, O., and Roy, P.: Water vapor continuum in the
    range of rotational spectrum of H 2 O molecule: New experimental data and their comparative
    analysis. Journal of Quantitative Spectroscopy and Radiative Transfer 187, 116–123 (2017).
    doi: 10.1016/j.jqsrt.2016.09.00




                                             116