<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Relevance Evaluation of Information Retrieval in the Integration of Information Systems on Inorganic Substances Properties</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Igor Temkin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>A.A. Baikov Institute of Metallurgy and Materials Science of RAS (IMET RAS)</institution>
          ,
          <addr-line>Moscow, 119334</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow, 109028</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National University of Science and Technology MISIS (Moscow Institute of Steel and Alloys)</institution>
          ,
          <addr-line>Moscow, 119049</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>348</fpage>
      <lpage>357</lpage>
      <abstract>
        <p>One of the main tasks in the integration of information systems is to provide relevant retrieval of information consolidated from heterogeneous sources. In the field of inorganic chemistry and materials science, set-theoretic methods of searching for relevant information are known. They ensure the construction of a sufficiently high-quality response to user requests. However, the problem of quantifying evaluation of information search relevance in this subject area remains open. This paper proposes an approach to quantifying evaluation of the relevance of information retrieval in integrated systems on inorganic substances and materials properties.</p>
      </abstract>
      <kwd-group>
        <kwd>relevance evaluation</kwd>
        <kwd>database integration</kwd>
        <kwd>inorganic substances</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The development and use of integrated information systems on substances and
materials properties that consolidate information from heterogeneous information sources is
worldwide common trend. These systems ensure that specialists are capable to quickly
find the required information. When developing such systems, the fundamental thing is
data representation method that describes corresponding chemical objects and their
properties. Furthermore, chemical objects data representation method, in its turn,
determines the class of methods for ensuring the search for relevant information and their
functionality. The purpose of this paper is to present a new approach for quantifying
evaluation of the relevance of information retrieval for integrated information systems
(IS) on inorganic substances and materials properties (ISMP) based on information
structures describing the qualitative and/or quantitative substance composition.</p>
    </sec>
    <sec id="sec-2">
      <title>The current state of the problem</title>
      <sec id="sec-2-1">
        <title>Heterogeneous information systems</title>
        <p>
          The information technologies development and the emergence of powerful hardware
and software tools for storing and processing information stimulated works on
information systems development in the field of inorganic materials science. As a result, a
large number of highly specialized information systems have been developed that are
focused on solving problems with due regard for specificity, conditioned by a specific
subject domain and research areas of a specific organization developing IS. An example
is a number of information systems based on databases developed and maintained by
IMET RAS. The IMET RAS information systems core consists of a number of
databases which store data on a variety of properties of substances:
• «Diagram» – database (DB) on the phase diagrams of semiconductor systems;
• «Crystal» – DB on the properties of acoustooptical, electro-optical and
nonlinearoptical substances;
• «Phases» – DB on the general properties of ternary and quaternary compounds;
• «Bandgap» – DB on the band gap of inorganic substances [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ];
• «Elements» – DB on the properties of chemical elements.
        </p>
        <p>
          These databases are heterogeneous not only by data structures, but also by software
and hardware tools ensuring their operation [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. It should be noted that above mentioned
DBs contain extensive information, but in a fairly narrow area. The situation when none
of the developed information systems contains a complete set of data on properties of
an object (substance or material) and the specialist needs to use several information
resources at once to search for the necessary information is typical not only for
inorganic materials science, but also for other subject domains.
        </p>
        <p>
          Obviously, to ensure a high-quality information service for materials scientists,
information systems integration in this subject domain is necessary. In Russia, the first
successful attempts in this direction were undertaken at the beginning of the century at
the IMET RAS for the integration of information systems mostly used by Russian users
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The integration allowed a consolidation of information resources for end users and
a significant reduction of the time spent by specialists to find the necessary information.
The applied consolidation approach was based on the Enterprise Application
Integration (EAI) method and showed its efficiency and good scalability when connecting
resources developed in different organizations to the integrated information system [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
For example, «TCS» (Thermal Constants of Substances - reference book on substances
thermal constants, developed by the Joint Institute for High Temperatures of Russian
Academy of Sciences (JIHT RAS) together with the Moscow State University (MSU))
and «AtomWork» (information system on inorganic substances properties, developed
by the National Institute for Materials Science (NIMS), Japan) are among successfully
integrated systems [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>One of the main difficulties in the heterogeneous information system (IS) integration
is the diversity of the chemical objects described in them. So, for example, «Diagram»
IS contains information at the level of the chemical system, i.e. a set of chemical
elements that form a certain phase diagram of a semiconductor system. Other IS on
inorganic substances and materials properties (ISMP) describe the properties at a specific
quantitative composition level (with a specific ratio of elements in chemical system),
taking into account crystalline modifications of substances, i.e. the quantitative
composition of the substance and its crystal lattice are described at this level. Such chemical
objects descriptions incompatibility in different IS ISMP dictates the need to use a
different description of chemical objects in an integrated IS ISMP, at least it’s required to
distinguish between several types of chemical objects: chemical systems, substances
and their crystal modifications.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Chemical objects hierarchy</title>
        <p>To describe the basic chemical objects of the considered problem domain the set theory
is used, taking into account that each subsequent level in the problem domain hierarchy
complements the description of the chemical object. The notation is the following: S is
the set of chemical systems; C – set of chemical substances, i.e. chemical compounds,
solid solutions, heterogeneous mixtures, etc.; M – set of crystal modifications. Then the
chemical system is denoted as s (where s ∈ S), the chemical substance is denoted by c
(where c ∈ C), and the crystal modifications is m (where m ∈ M).</p>
        <p>
          Having designated second level objects by the «substance» term, we get three-level
chemical objects hierarchy: chemical system, chemical substance and chemical
modification [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As far as information stored in DBs on inorganic substances properties can
be considered at chemical system level, for simplicity we’ll use this level from the top
of the objects hierarchy. So, the chemical objects hierarchy and relationships between
chemical objects can be described by means of chemical objects hierarchy in tree form
(Fig. 1).
        </p>
        <p>Any chemical system s can be represented as a set of chemical elements ei:
s = {e1, e2,…, en}. Any chemical substance c is defined not only by the set of atoms
(chemical elements), but also by their quantitative incorporation into the composition
of the compound, solution or mixture. Therefore, any substance c can be represented
by a tuple (s, f), where s ∈ S, and f is a mapping of the set of atoms (chemical elements)
that make up the substance, in the set of R* × R* pairs that define the minimum and
maximum incorporation of a given chemical element in a compound, solution or
mixture c.</p>
        <p>That is, f : ei → (R*min, R*max), where R* = R+ ∪ {x}. R+ – is the set of non-negative
real numbers, and R* is the set of R+ extended by the element x. The element x is used
to denote an unknown number, since in the notation of mixtures where the incorporation
of components may vary, it is customary to use x to denote an unknown, for example,
Fe1-xSex. R*min and R*max are, respectively, the minimum and maximum concentration
of the chemical element ei in the substance c.</p>
        <p>In the case when the concentration of a particular chemical element ei in the
substance c is fixed, then R*min = R*max. Chemical modification m can be represented by a
tuple (s, f, mod), where s ∈ S, f : ei → (R*min, R*max), and mod is the string notation for
the crystal modification of a substance – common for integrated IS ISMP (one of the
singony enumeration values: {Triclinic, Monoclinic, Orthorhombic, Tetragonal,
Trigonal, Hexagonal, Cubic}).
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Metabase structure</title>
        <p>Quite reasonably, when designing integrated IS ISMP, it’s required to provide search
facilities for relevant information contained in other IS ISMP of distributed system.
Therefore, it’s required to develop some active data store that should “know” what
information is contained in every integrated IS ISMP. Considering chemical objects
hierarchy, some database should exist that describes information contained in integrated
resources in terms of chemical systems, substances and crystal modifications. Here we
come to the metabase concept – a special database that contains metadata that describe
integrated IS ISMP contents in terms of chemical objects hierarchy as well as some
additional information on users and their permissions together with information
required to integrate distributed IS ISMP (Fig. 2).</p>
        <p>The metabase defines integrated IS capabilities. Its structure should be flexible
enough to represent metadata on integrated ISs ISMP contents and at the same time the
metabase structure should be simple and versatile to describe arbitrary data source on
inorganic substances properties without exhaustive additional payload currently offered
by numerous materials ontologies. Taking into consideration the fact that chemical
objects and their corresponding properties description is given at different detail level in
different ISs ISMP, it’s important to develop metabase structure that would be suitable
for description of information residing in different ISs ISMP. For example, some
integrated DBs contain information on particular crystal modifications properties while
others contain properties description at chemical system level. Thus, integrated ISs
ISMP deal with different chemical objects situated at different chemical objects
hierarchy levels. For simplicity in current paper we consider only a part of metabase structure
that is devoted to chemical systems and their properties (Fig. 2). The amount of this
metainformation should be enough to perform search for relevant information on
systems and corresponding properties.</p>
        <p>All tables (Fig. 2) can be logically separated into several groups according to their
purpose:
• DBInfo – root table, that contains information on integrated database systems;
• DBExcludeCompatibility – table that stores exception list of ISs for relevant
information search;
• UsersInfo, UsersAccess – tables that contain information on integrated system users
and their access rights to integrated IS ISMP;
• SystemInfo, PropertiesInfo, DBContent – tables that describe contents of integrated</p>
        <p>
          IS ISMP;
• CompatibilityClasses, Compatibility, Systems2ConsiderInCompatibility – tables
that contain information on accessible relevance classes and their contents (currently
3 relevance classes are used [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]).
• Meta_Systems, Meta_DBSystems, Meta_SystemsHierarchy,
Meta_SystemsElement – tables to describe all chemical systems contained within integrated IS ISMP
with respect to their relation to each other and chemical elements, they consist of.
• Versions – service table (not shown on diagram). It is used for database schema
update and versioning.
        </p>
        <p>
          Taking into account chemical objects hierarchy description, a special method was
developed to search for relevant information in the context of an integrated information
system, based on a set-theoretic approach [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>A set-theoretic approach to relevance evaluation</title>
      <p>
        Relevance itself and its notion to information search is a philosophic term, covered in
numerous publications. A comprehensive review of relevance itself is given by Tefko
Saracevic [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We consider information search relevance in application to integrated IS
ISMP, that area is close to “chemical similarity” [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. So, considering chemical objects
hierarchy description, a special method was developed to search for relevant
information in the context of an integrated IS ISMP, based on a set-theoretic approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
The main essence of set-theoretic approach is in the use of abovementioned metabase
structure, that is a special database that contains information on integrable IS ISMP (set
D), chemical systems (set S) and their properties (set P). To describe the relationship
between the elements of the sets D, S, and P, the ternary relation W was defined on the
set U (universum), which is the Cartesian product: U = D × S × P. The element (d, s, p)
belongs to the relation W, where d ∈ D, s ∈ S, p ∈ P is interpreted as follows: “the
integrable IS ISMP d contains information on the p property of the chemical system s”.
      </p>
      <p>
        Thus, according to accepted notation the search for relevant information on a
particular chemical system s can be reduced to proper definition of an R relation, which is a
subset of the S × S Cartesian product (in other words, R ⸦ S2). Thus, for any pair (s1,
s2) ∈ R, we can state that the s2 system is relevant to the s1 system. For the practical
solution of the problems of searching for relevant information in integrable information
systems, the following rules are often used to construct R [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]:
1. For any set s1 ∈ S, s2 ∈ S, which includes the notation of chemical elements eij,
s1 = {e11, e12, ..., e1n}, s2 = {e21, e22, ..., e2n}, it’s true , that if s1 ⊆ s2 (that is, all
chemical elements from s1 system are contained in s2 system), then (s1, s2) ∈ R.
2. The relation R is symmetric. In other words, for any s1 ∈ S, s2 ∈ S it is true that, if
(s1, s2) ∈ R, then (s2, s1) ∈ R.
      </p>
      <p>
        It should be noted that abovementioned automatic variant of R relation generation is
just one of the simplest and most obvious variants of such rules, and in fact more
complex mechanisms can be used to get R relation. Other alternatives are used to build the
R relation, called relevance classes. For example, browsing information on a particular
property of a compound in one of integrated IS ISMP (in fact, it is information defined
by (d1, s1, p1) triplet), we consider (d2, s2, p2) triplet to be relevant information. (d2, s2,
p2) triplet characterizes information on some other property of a chemical system from
another integrated IS ISMP. This enables us to define relevant information more
precisely, e.g. if we consider the R relations in the form: R ⸦ (d1, s1, p1) × (d2, s2, p2), where
d1, d2 ∈ D, s1, s2 ∈ S, p1, p2 ∈ P. Actually, it’s possible to even define a set of several R
relations (R1, R2, …, Rn) by applying different rules to enable users to perform search
for relevant information based on wide variety of R interpretations. However complex
interpretations of R (R ⸦ (d1, s1, p1) × (d2, s2, p2)) are not being currently used in IMET
RAS, since metabase structure would be more complex to store such relations however
its reasonability is not so clear. In IMET RAS simple relevancy relations of R ⸦ S2 are
used. More rules to form relevance classes are given in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Improvement of the search relevance can also be achieved by using the ci level, i.e.
taking into account the quantitative composition of a substance, or crystal modifications
of a specific substance mi instead of chemical system designations si in cases when a
user requests relevant information, being at the level of inorganic substances or their
modifications in the system-substance-modification hierarchy concepts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>When searching at the substance level, the quantitative compound composition is
taken into account. The pair (aimin, aimax) denotes the quantitative inclusion of chemical
element ei ∈ s into the composition, aimin, aimax ∈ R+, aimin ≤ aimax. If aimin = aimax, then
the substance has a constant composition by the element ei ∈ s. For each element of the
chemical system ei ∈ s, user during the search could specify a pair (rimin, rimax), where
rimin, rimax ∈ R+, denoting the allowable interval of the i-th element in the substance (R+
is the set of non-negative real numbers). Then all substances belonging to the same
chemical system are considered relevant, if for each pair (rimin, rimax) the following is
correct: aimin ∈ [rimin, rimax] or aimax ∈ [rimin, rimax]. In other words, if the logical
disjunction [rimin ≤ aimin &amp; aimin ≤ rimax] + [rimin ≤ aimax &amp; aimax ≤ rimax] = true for all ei ∈ s, then
the data on the substance are considered relevant.</p>
      <p>When searching for relevant information taking into account the crystal
modifications of mi, crystal systems are taken into account, since often information on crystal
structures is shown in different ways. For example, for lithium niobate (LiNbO3) a
hexagonal or trigonal crystallographic system is indicated in different information sources
of the IS ISMP, which, in fact, corresponds to the same crystal modification.</p>
      <p>
        However, it should be noted that despite the fact that the described approach, in
general, provides an acceptable level of search relevance for inorganic compounds, it
suffers from the inability to obtain a quantitative assessment of the search relevance and,
as a consequence, the fundamental inability of search results changes by adjusting some
parameters or corresponding metrics. Note that such an adjustment is useful in some
cases, in particular when preparing training data sets for machine learning tasks in
computer-aided construction of inorganic compounds [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Graph approach to relevance assessment</title>
      <p>To search for relevant information and obtain a quantitative measure of relevance
assessment within an integrated information system based on the properties of inorganic
substances and materials, we propose to use a graph model based on the weighted graph
G = (V, E), built on chemical objects described as part of an integrated information
system.</p>
      <p>Let’s define a set of vertices V for graph G. In accordance with the accepted
threelevel description of chemical objects in an integrated information system, the set of
vertices consists of three disjoint subsets V = {S, C, M}, where S is the set of chemical
systems si (qualitative compound composition), C is the set of chemical compounds ci
( the quantitative compound composition or the substance formula), M is the set of
crystal modifications mi of specific substances.</p>
      <p>Define a set of edges E for graph G, as the union of non-intersecting subsets
E = Es  Ec  Em  Esc  Ecm, where Es – edges that are incidental only to the set
of vertices S; Ec – edges that are incidental only to the set of substances C; Em – the
edges that are incidental only to modifications set M. The vertices connectivity for the
classes of S, C, M is achieved by two sets of edges: Esc edges to connect vertices from
S and C sets; and Ecm edges to connect vertices from C and M. Please note, that the
edges connecting vertices from S and M sets, are absent.</p>
      <p>To define the elements of the E subsets we need to introduce a couple of trivial
functions: Fs(c) and Fc(m). The Fs(c) function returns the chemical system for a given
compound c, i.e. it allows to get qualitative composition from quantitative composition.
The Fc(m) function returns quantitative composition of a particular crystal modification
of the substance, i.e. it allows to get quantitative composition from a particular crystal
structure of the compound. Then, given that the chemical system is a set of chemical
elements s = {e1, e2, ..., en} we get the following set of edges:</p>
      <p>Es = {(si, sj)}, where si = {ei1, ei2, ..., ein}, sj = {ej1, ej2, ..., ejm}, | si | = n, | sj | = m,
m – n = 1, si ⸦ sj;</p>
      <p>Ec = {(ci, cj)}, where Fs(ci) = Fs(cj);
Em = {(mi, mj)}, where Fc(mi) = Fc(mj);
Esc = {(si, cj)}, where Fs(cj) = si;
Ecm = {(ci, mj)}, where Fc(mj) = ci.</p>
      <p>When searching for relevant information for a chemical object, it is necessary that a
path should exist in graph between the corresponding object and a relevant one, and it
is easy to calculate the measure of relevance by adding the weights of the edges on the
corresponding path. Thus, we come to the necessity of introducing a real-valued
function W defined on the set of graph edges:</p>
      <p>W(Es) = 1000;</p>
      <p>W(Ec) = W((ci, cj)) = min⁡(∑ =0 10 ∗ | 
−   |) ;
where n = | Fs(ci) | = | Fs(cj) | , qik and qjk – quantitative occurrence of k-th element
at ci and cj compositions, i.e. Q: ek → R+ (respectively Q(elik) = qik, Q(ejk) = qjk), and the
order of elements in substances is selected so to ensure the minimum value of the W(Ec)
objective function.</p>
      <p>W(Em) = 0.1;
W(Esc) = 100;
W(Ecm) = 1.
(1)
(2)
(3)
(4)
(5)
(2.1)
(2.2)
(2.3)
(2.4)
(2.5)</p>
      <p>As an example, we give a fragment of the relevance graph for chemical systems
CuIn-S and In-S (Fig. 3). On this example we emphasize its properties and justify the role
of edge weights for quantitative assessment of the chemical objects’ relevance.
Based on the definition of the set of edges E, it can be seen that the relevance graph is
partitioned into subgraphs based on the vertices of a set S (chemical systems).
Moreover, there is no path in the graph between substances from different chemical systems,
bypassing the vertices of chemical systems. The vertices of the systems themselves are
connected by an edge only if the set of elements of one of the systems is an own subset
of the other system and their powers (i.e. a number of chemical elements that built up
a system) differ by one.</p>
      <p>Consider a subgraph constructed on the basis of the In-S chemical system vertex and
consisting of substances and their corresponding modifications related to this system.
It should be noted that the subgraph composed of vertices of a C set (compounds, i.e.
qualitative formula) is complete, as all the vertices (InS, In2S3, In4S5, In6S7) are
connected to each other and form a clique. Note, however, that the weights of the ribs
connecting the vertex substances are different. Edge weight is a quantity characterizing
the degree of closeness of corresponding quantitative compositions: the smaller the
difference, the lower the weight («cost») of transition along the edge, and the
corresponding substance is considered more relevant than other with greater weight of transition.</p>
      <p>Similarly, modifications subgraph constructed on the basis of the vertex designating
a particular compound is complete, and the weights of all edges are equal to 0.1. In
Fig. 3 such edges are connected to each other, e.g. α-In2S3 and β-In2S3 vertices. Note,
that the transition from modification to the corresponding substance has a cost of 1, and
the transition from substance to the system – 100, which makes more relevant data on
other modifications (including crystal structure) than the transition to the level of
substances to choose another qualitative composition.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and further model development</title>
      <p>The proposed graph model is an attempt to reflect the similarity degree of various
chemical objects even at different representation level (system, compound, modification). In
this sense, the path cost is a measure of the difference between the corresponding
chemical objects, which are the vertices of the graph. The more similar the objects, the
«closer» they are, meaning the path cost in the graph is less. It is worth noting that, in
a broad sense, according to the definitions given in the paper, the overall relevance
graph is disconnected due to the absence of a path between the vertices of chemical
systems, that have no common chemical elements (i.e. s1 ∈ S, s2 ∈ S such that
s1 ∩ s2 = ∅). For example, in the current model, there is no connectivity between In-S
and Ga-As chemical systems, although In and Ga are similar in many ways, as far as In
and Ga are elements from the same subgroup of the periodic system. In this sense, it is
advisable to introduce rules for the formation of edges between similar substances and
systems (in which an element from the same periodic system subgroup changes),
although such an edge should have an appropriate (sufficiently large) weight comparing
with analogues with common chemical elements.</p>
      <p>As possible ways of further graph model development, one can offer the
transformation of edges from the sets Esc and Ecm in pairs of arcs. In this case, the weight of
the arc in the direction from the modification to the substance and from the substance
to the system should be made much less than the weight of the original edge, and the
reverse arc should preserve the original edge weight. This measure will allow to obtain
relevant information, described one or two levels above, which is a common way of
information search in chemistry.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the paper by means of the graph model, the concept of relevant information search
was extended regarding to integrated IS ISMP. The new model allows to obtain
quantitative relevance assessment of information retrieval based on the path calculation in a
weighted graph, which allows ranking of chemical information found in consolidated
data sources. The proposed approach is applicable not only to improve information
retrieval for end users – material chemists, but also for application to computer aided
design of inorganic compounds at the stage of training samples formation based on the
quantitative relevance assessment.</p>
      <p>This work was partially supported by the Russian Foundation for Basic Research
(project no. 18-07-00080) and the State task № 075-00746-19-00.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kiselyova</surname>
            ,
            <given-names>N.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dudarev</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korzhuyev</surname>
            ,
            <given-names>M.A.:</given-names>
          </string-name>
          <article-title>Database on the bandgap of inorganic substances and materials</article-title>
          ,
          <source>Inorganic Materials: Applied Research</source>
          .
          <year>2016</year>
          . v.
          <volume>7</volume>
          . № 1. p.
          <fpage>34</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kiseleva</surname>
            ,
            <given-names>N.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prokoshev</surname>
            ,
            <given-names>I.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dudarev</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khorbenko</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belokurova</surname>
            ,
            <given-names>I.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Podbelsky</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemskov</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Database system on materials for electronics on the Internet</article-title>
          .
          <source>Inorganic materials</source>
          ,
          <year>2004</year>
          , t.
          <volume>40</volume>
          , №
          <volume>3</volume>
          , p.
          <fpage>380</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kornyshko</surname>
            ,
            <given-names>V.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dudarev</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          :
          <article-title>Software Development for Distributed Electronics Materials</article-title>
          .
          <source>In proceedings of the Third International Conference “Information Research, Applications and Education - i.Tech ”</source>
          , Sofia, FOI-Commerce,
          <year>2005</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dudarev</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiselyova</surname>
            ,
            <given-names>N.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamazaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Virtual integration of the Russian and Japanese databases on properties of inorganic substances and materials</article-title>
          .
          <source>MITS</source>
          <year>2009</year>
          .
          <source>In Proceedings of Symposium on Materials Database, National Institute for Materials Science (NIMS)</source>
          ,
          <source>Materials Database Station (MDBS)</source>
          ,
          <year>2009</year>
          , p.
          <fpage>37</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dudarev</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          :
          <article-title>Integration of information systems in the field of inorganic chemistry and materials science</article-title>
          .
          <source>ISBN 978-5-396-00745-1</source>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          : KRASAND,
          <year>2016</year>
          , 320 p.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Saracevic</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <year>2007</year>
          , v.
          <volume>58</volume>
          (
          <issue>3</issue>
          ), p.
          <fpage>1915</fpage>
          -
          <lpage>1933</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sen'ko</surname>
            ,
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiselyova</surname>
            ,
            <given-names>N.N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dudarev</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dokukin</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryazanov</surname>
            ,
            <given-names>V.V.:</given-names>
          </string-name>
          <article-title>Various Machine Learning Methods Efficiency Comparison in Application to Inorganic Compounds Design. In Selected Papers of the Data Analytics and Management in Data Intensive Domains</article-title>
          .
          <source>Proceedings of the XX International Conference - DAMDID / RCDL'2018, October 9-12</source>
          ,
          <year>2018</year>
          , Moscow, V.
          <volume>2277</volume>
          , p.
          <fpage>152</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Serain</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Middleware</surname>
          </string-name>
          and Enterprise Application Integration. London: Springer-Verlag,
          <year>2002</year>
          .
          <source>ISBN 978-1-85233-570-0</source>
          . 288 p.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Maggiora</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>M. Concepts and Applications of Molecular Similarity</article-title>
          . New York: John Willey &amp; Sons,
          <year>1990</year>
          .
          <source>ISBN 978-0-471-62175-1</source>
          . 393 p.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>