<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computed Knowledge Base for Description of Information Resources of Water Spectroscopy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Fazliev</string-name>
          <email>Alexander@iao.ru</email>
          <email>Fazliev@iao.ru</email>
          <email>faz@iao.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey Privezentsev</string-name>
          <email>Alexey@iao.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Tsarkov</string-name>
          <email>tsarkov@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Tennyson</string-name>
          <email>jtennyson@ucl.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annotation</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Model</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alexander Fazliev</institution>
          ,
          <addr-line>Alexey Privesetsev, faz</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Atmospheric Optics SB RAS</institution>
          ,
          <addr-line>Zuev Square. 1, 634021 Tomsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University College London</institution>
          ,
          <addr-line>Gover St. London WC1E 6BU</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Manchester</institution>
          ,
          <addr-line>Oxford Road, Manchester M13 9PL</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We develop the addition to the W@DIS information system that allows one to load solutions of the quantitative spectroscopy problems. These solutions are supplied with calculated semantic annotations that characterise properties of the solutions. In addition to the typical properties (e.g. represented in Dublin Core) the solution reliability properties are also determined. Every solution is represented in a knowledge base as an individual of the quantitative spectroscopy ontology, which contains more that 106 axioms. In the paper we present the knowledge base structure and describe two classes of the information sources classification tasks, together with the solutions using querying OWL ontologies.</p>
      </abstract>
      <kwd-group>
        <kwd>Water Spectroscopy Classification Problems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The interest to the spectral properties of the water molecule is caused by its
exceptional status. Water participates in many processes on Earth, including the life
processes. This molecule is one of the most studied ones. In this paper we present a
knowledge base which is capable to describe the full set of the spectral properties of
the water molecule on a logical level. This work requires the united effort of three
groups of professionals in spectroscopy, information systems and logics.</p>
      <p>In the last decade molecular spectroscopists have united to address key tasks
requiring a cooperative solution. One of these tasks is the development of a
comprehensive representation of the spectrum of the water molecule. A suitable
theoretical strategy for representing the spectrum was formulated in the framework of
two projects [1-3]. The first protocol, “Marvel”, is generally applicable for molecular
spectroscopy [1]. A collective effort from domain specialists was made for collecting
and validating both calculated and measured spectral data [2].</p>
      <p>In order to collect and represent spectroscopic data an information system W@DIS
(http://wadis.saga.iao.ru) with three layers architecture [4] was implemented. In this
architecture a knowledge layer contains scientific annotations of the spectroscopic
problem solutions [5]. These annotations were represented as individuals of
OWLontologies in W@DIS.</p>
      <p>As a result of this joint work, the knowledge base describing properties of water
spectroscopy problems' solutions was created.</p>
      <p>To solve the classification problems of quantitative water spectroscopy we use the
Description Logic reasoner FaCT++ [6,7]. The size of data in the knowledge base
restrict the authors in the complexity of the TBox part of the ontology and force them
to adjust the ABox structure according to the facilities of the reasoning system.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Molecular Spectroscopy Model</title>
      <p>In this work we consider the conceptualizations of two domains. The concepts
related to the description of physical system states are included into spectroscopy
model with high level of granularity, while the concepts that characterize the
processes of transition from one state into another are not described in details. De
facto such domain model is widely used, for example, in a series of physics domains
in which the procedures of experimental results acquisition are well-established but
the amount of data is huge requiring many years experimental measurements. One of
such domains, i.e. water quantitative spectroscopy, is described in the paper.</p>
      <p>The simplification of procedure knowledge is caused by the assumption due to the
following fact: in practice, for example, in information description of quantitative
spectroscopy the most needed information is the quantitative information on
molecules states. In quantitative spectroscopy the procedure knowledge may be of
interest only to a narrow circle of specialists that implement different methods of
domain problems’ solution.</p>
      <p>Schematic model of quantitative molecular spectroscopy for the information
sources of which the scientific annotations are composed automatically is presented in
Fig. 1.</p>
      <p>In the model of a domain the solutions of the molecular spectroscopy problem
were considered as domain data. Every solution of the certain problem has annotation
that is a set of metadata. This set contains properties of solution of a spectroscopic
problem for a definite molecule in a certain physical conditions published in a journal,
a monograph or in the Internet together with the values of these properties. We name
such metadata as an annotation of information resource.</p>
    </sec>
    <sec id="sec-3">
      <title>3 The Knowledge Base Structure</title>
      <p>The knowledge base (KB) consists of the two parts, each represented as a set of
OWL ontologies. Fig. 2 shows the KB structure, where arrows corresponds to the
import statements.</p>
      <p>Fig.2. Knowledge Base Structure</p>
      <p>The TBox contains five basic ontologies that describe the spectroscopy domain.
The DataSources ontology describes papers on spectroscopy. It also describes
properties of the solutions of the water spectroscopy problems that are published this
way. The Spectra ontology describes the spectrum characteristics of the matter, the
Substance one describes molecules, etc.</p>
      <p>Overall TBox contains 75 classes, 41 object properties and 127 data properties. It
also contains 63 individuals that represent methods to solve spectroscopy problems,
physical measurement units, the spectral line form and other physical quantities. The
expressivity of a TBox is ALCHOIN(D).</p>
      <p>The ABox of a KB contains more that a million facts that are obtained from the IS.
These facts were generated by IS application during the analysis of the spectroscopy
problems' solutions. The ABox is split into two parts. One part describes properties of
the solutions of the water spectroscopy problems, based on one article (about 55k
axioms in 62 files). Another part describes the standard deviation between solutions of
the same problems from different papers, and contains more that 1,300,000 axioms in
39 files.</p>
      <p>In W@DIS information system a user can add new facts independently of the
others. Thus it is possible to use this feature to create ontologies for that user's specific
tasks. Some of such tasks are described below.</p>
      <p>The KB currently contains the description of properties of solutions for the water
molecules isotopomers (Molecule= H2O, H217O, H218O, HDO, HD17O, HD18O, D2O).
The parameter n (see Fig.1) takes values 1,2,3,5,6,7, and a pair (i,j) takes values
(1,7), (2,6), (3,5). All components of the KB are publicly available and can be
downloaded from the IS W@DIS website (http://wadis.saga.iao.ru/saga2/ontology).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Two Classification Problems</title>
      <p>The Water Spectroscopy KB allows one to classify the physical quantities
according to their properties. Here we describe two such problems.</p>
      <p>The first problem is to classify all the published information sources as reliable and
unreliable. There are several such classifications as there are several validity criteria.
Table 1 illustrates the results of reliable information sources selection from the
analysis of about 800 scientific annotations. In this table the m(n) line for a given
water isotopomer and a problem T means that out of m publications that contains a
solution for the problem T only n contains reliable solutions.</p>
      <p>Task
/Molecule</p>
      <p>H2O
H217O*
H218O*
HDO
HD17O
HD18O
D2O
D217O
D218O
Total</p>
      <p>Calculations of
energy levels
(T1)
9 (2)
4 (0)
4 (0)
1 (0)
1 (1)
15 (3)</p>
      <p>Measurement
of energy levels</p>
      <p>(T7)
30 (24)
19 (15)
18 (18)
32 (28)
3 (3)
5 (4)
18 (8)</p>
      <p>The validity was checked according to the restrictions on the values of the quantum
number that came from selection rules for transitions and restrictions on rotational
quantum numbers for energy levels. The asterisk symbol indicates that an annotation
for a molecule was modified according to the comments of experts on quantum
number assignment correction. It is easy to see that the solutions of T1 and T2
problems contain a bigger percentage of data arrays that in turn contain solutions with
incorrect quantum numbers. An example of a query that describes a restriction on the
quantum numbers values is shown in Section 5.</p>
      <p>The second problem is to classify values obtained from the water spectroscopy
problems' solutions by means of root-mean-square (standard) deviations. It can be
viewed as a refinement of the previous problem. Some quantum numbers assigned to
a certain physical quantity can satisfy the validity criteria but the assignment itself can
be incorrect. The value of a standard deviation helps to figure out whether the values
were calculated inaccurately or incorrect assignment was made.</p>
      <p>The reasoning over an ontology is used to answer queries about its elements. The
typical queries are used to determine properties of a solution of a spectroscopy
problem (see Fig. 1). These queries are built by restricting values of certain
properties. As an example lets create a query to find all the canonical information
sources for water isotopomer H217O and problems T1 and T7 (in the canonical
information source quantum numbers satisfies all chosen criteria, described as a
selection rules):</p>
      <p>InformationSource that hasSubstance value H2_17O</p>
      <p>and hasOutputData_MD some (hasTransitionQuantumNumbers_MD some
(hasQuantumNumbersType value NormalModes
and hasNumberOfNonuniqueTransitions some {0}
and (hasNumberOfUnlabeledTransitions some {0}</p>
      <p>or hasNumberOfUnassignedTransitions some {0})
and hasNumberOfInvalidIdentifications some {0}
and hasNumberOfInvalidTransitions some {0}
and hasNumberOfInvalidWaterTransitions some {0}
and hasNumberOfInvalidWater-C2V-Transitions some {0}
and hasNumberOfRejectedTransitions some {0}))</p>
      <p>The query structure depends on the type of the problems (here T1 and T7) and the
symmetry group of the molecule (here C2v), that define the number of selection rules.
Using similar queries allows one to separate information sources to canonical and
non-canonical. Note that some domains have weaker notions of canonicity so the
query can be simplified.</p>
      <p>Another problem of the information source classification can be solved using the
restrictions on the values of standard deviation of some physical quantity. Here is an
example of query (PublicationsWithLargeDeviation_Band_0_5_1_0_5_0) which is
used to find all publications in which the value of a standard deviation for the
vibration band 051-050 is over 0.1 cm-1 (note that most of the details related to
spectroscopy physics are omitted here):
isRMSMemberOf some (hasRMSBandPair some (hasRMSTransitionBand some
(hasTransitionQuantumNumbersOfBand value Identification_on_NormalModes_0_5_1_0_
5_0_Band
and hasBandRMSDeviationValue some float[&gt; 0.1])))</p>
      <p>This query can be useful for a person that require some physical parameters related
to given vibration band in the following way. If the query corresponding to class
PublicationsWithLargeDeviation_Band_0_5_1_0_5_0 is non-empty (i.e., contains
some information sources) then these sources are unreliable w.r.t. given band. In this
case the additional check of the values of corresponding physical characteristics is
necessary.</p>
    </sec>
    <sec id="sec-5">
      <title>6 Conclusions</title>
      <p>In this paper we present a model of quantitative molecular spectroscopy. We
describe an information system W@DIS that contains data about molecular
spectroscopy problems' solutions from the published papers. In particular, it contains
the complete set of facts about solutions for the water spectroscopy published in the
last 60 years.</p>
      <p>Using this model as a knowledge domain we develop a knowledge base that
describes properties of the solutions of the common problems in the domain area. We
describe two classes of problems of information sources classification that can be
solved in the KB. We provide examples of queries to the KB that describes classes in
the OWL ontology that solve the necessary problems.</p>
      <p>As a part of the future work we plan to update the KB with facts about other
important molecules, including methane, carbon dioxide, carbon oxide, acetylene,
ammonia, and others.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Furtenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Császár</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Tennyson</surname>
          </string-name>
          ,
          <article-title>MARVEL: measured active rotational-vibrational energy levels</article-title>
          ,
          <source>J. Molec. Spectrosc., v 245</source>
          ,
          <year>2007</year>
          , p.
          <fpage>115</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Tennyson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.F.</given-names>
            <surname>Bernath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.R.</given-names>
            <surname>Brown</surname>
          </string-name>
          , et al,
          <article-title>IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part I. Energy Levels and Transition Wavenumbers for H217O and H218O</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Quant</surname>
          </string-name>
          . Spectr. Rad. Transfer,
          <year>2009</year>
          , v.
          <volume>110</volume>
          ,
          <string-name>
            <surname>Pages</surname>
          </string-name>
          573-
          <fpage>596</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>IUPAC project N</surname>
          </string-name>
          2004-
          <volume>035</volume>
          -1-100 «
          <article-title>A database of water transitions from experiment and theory»</article-title>
          . http://www.iupac.org/web/ins/2004-035-1-100.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>De Roure D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jennings</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadbolt N. A Future</surname>
          </string-name>
          e-Science Infrastructure // Report commissioned for EPSRC/DTI Core e-Science
          <string-name>
            <surname>Programme</surname>
          </string-name>
          .
          <year>2001</year>
          . 78 p.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Privesentsev A.I.</surname>
          </string-name>
          ,
          <article-title>Ontological knowledge base implementation and software for information resources description in molecular spectroscopy</article-title>
          , Tomsk State University,
          <source>PhD Dissertation</source>
          ,
          <year>2009</year>
          , 238 Pages.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Horrocks</surname>
          </string-name>
          . FaCT++
          <article-title>Description Logic Reasoner: System Description</article-title>
          .
          <source>In Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR</source>
          <year>2006</year>
          ), volume
          <volume>4130</volume>
          <source>of Lecture Notes in Artificial Intelligence</source>
          , pages
          <fpage>292</fpage>
          -
          <lpage>297</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          , Ian Horrocks, and
          <string-name>
            <surname>Peter F. Patel-Schneider</surname>
          </string-name>
          .
          <article-title>Optimizing Terminological Reasoning for Expressive Description Logics</article-title>
          .
          <source>J. of Automated Reasoning</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>277</fpage>
          -
          <lpage>316</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>