<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CONSTRUCTING THE ANALYTICAL MODEL FOR SPECIALIZED MODEL-DRIVEN SYSTEM OF SCIENTIFIC DATA CONSOLIDATION</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna V. Korobko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey A. Korobko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computational Modelling SB RAS</institution>
          ,
          <addr-line>Krasnoyarsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Efficient storage and analytical processing of experimental results is a major part of obtaining new scientific knowledge. Specialized systems for consolidating scientific data allow you to record the results of your scientific research. Model-driven systems contain the control runtime model describing the object of study in detail. The paper proposes an algorithm for the automatic generation of an analytical model based on the control model of consolidating scientific data.</p>
      </abstract>
      <kwd-group>
        <kwd>analytical model</kwd>
        <kwd>data consolidation and processing</kwd>
        <kwd>research data</kwd>
        <kwd>model-driven development</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>become developing means of native formation of analytical queries to data. The main idea is to
build a transparent analytical model [5].</p>
      <p>Uniqueness of the software platform for building systems for collecting and analyzing
research data consists in formatting the control model during the creation of a specialized system.
The control model describes a composition of the consolidated data and defines the interface of
creating system. The control model is the thematic core of specialized system. It is stored as
metadata and can be the basis for building an analytical model.</p>
      <p>This article is intended to present an original approach to the formation of the analytical
model for a specialized model-oriented system based on the control model. The first section is
devoted to the concept of MDD (model-driven development) and formal description of the control
model generated in the present software platform on the case of the System of research the state of
the soil cover. The second section presents the theory of constructing an analytical model in
accordance with the CWM specification. The third section contains an algorithm for formatting the
analytical model of the specialized system for collecting research data based on its control model. In
conclusion, the results of the article are summarized and the tasks for the continuation of this study
are formulated.</p>
      <p>The control model for specialized model-driven system of data consolidation. The
classical model-driven approach (MDD) of software development involves the construction of a set
of models of different levels of abstraction during the design and implementation of software [6].
The development of high-level abstraction models includes the processes of constructing a
metameta model (M3) and a meta-model (M2). M3 is the model of the modeling language. M2 is the
logical model of the subject area of application in the meta-meta model notation. The design of
lowlevel abstraction models consists of the process of building application-level models (M1) and the
stage of formation of instances of concepts (M0) defined at the M1 level [7]</p>
      <p>The authors of this paper have previously proposed an original implementation of the
approach to model-driven development. The main advantages of the proposed implementation in
comparison with the classical approach are the formation of the control model (M1 level model)
during the construction of the system through the platform and the automatic formation of
application models based on the control model. This implementation helps to significantly reduce
the requirements for the user's qualification in the field of information technology. This allows the
user to focus on research. At the same time, the approach retains the flexibility and versatility of
model-driven development. According to the original approach, the control model is included in
every system in the form of metadata. This makes it possible to respond quickly to changing
requirements for thematic content and permanently develop the systems in accordance with the
growing research. The advantages of the proposed implementation of the model-based approach are
achieved due to the deliberate narrowing of the platform functionality and through the development
of its meta-model (M2 level), corresponding to the purpose of the systems. The functionality of the
platform is focused on building systems for collecting and consolidating research data. As a
metametamodel (level M3) used the notation of Unified Modeling Language (UML).</p>
      <p>The metamodel contains three classes of objects: the class "Object" N, the class "Attribute" F
and the class "Group" G. Objects of the class "Attribute" are described by the triple F=(A, T, D),
where A is the attribute name, T is the name of the specialized attribute type, D is the attribute
temporality flag. An example of a control model built in accordance with the proposed metamodel
is shown in figure 1.
The metamodel defines relationships between instances (objects) of model classes. Two types
of relations, "Nesting" and "Dependence" are defined between objects. One-multivalued ratio
"Nesting" – φ, is given on the set N, φ⊆N×N and is intended to set the organizational hierarchy of
objects. One-valued relation "Dependence", denote it as χ, is given on the set N, χ⊆N×N. The
relationship allows you to link objects to each other, implementing various functional interactions.
A multi-valued correspondence between objects and attributes is "Ownership", denoted as θ, where
θ⊆N×F. One-valued correspondence between objects and groups is – "Consolidation", denoted as
ψ, where ψ⊆N×G. The metamodel is described in more detail in [8].</p>
      <p>The control model, formed under the proposed metamodel, formally describes both the
subject of study and the results obtained in the course of scientific research. In work [3] the order of
formation of the control model and requirements to its elements providing consistent storage of data
and reliability of results of their analysis is offered.</p>
      <p>
        The analytical model. The concept of model-driven development described above is widely
used in many areas of information technology. In particular, the consortium Obj
        <xref ref-type="bibr" rid="ref6">ect Management
Group (OMG) in 2003</xref>
        proposed its own Common Warehouse Metamodel (CWM) to provide
information exchange between heterogeneous systems for the purpose of analytical data processing
[9].
      </p>
      <p>The CWM specification is a set of M2-level models (according to the MDD approach) that
provides a description of relational sources, XML documents, and the multidimensional analytical
model (Fig. 2.). The analytical model is described in terms of On-Line Analytical Processing
(OLAP) technology and Data warehouse structure. The Data warehouse is designed to consolidate
"operational" data from heterogeneous sources and prepare them for operational processing.</p>
      <p>
        A key requirement of OLAP is presenting data in a multidimensional form that is intuitive to
the user. Multidimensionality is the division of data into dimensions and measures. Dimensions are
aspects of analysis. Measures are aggregated numerical characteristics of the process under study.
The analytical model of the CWM specification is based on the dimensional fact model proposed by
        <xref ref-type="bibr" rid="ref10">Mateo Golfarelli (et al.) in 1998</xref>
        [10] (Fig. 2). The measures are used to describe a single analyzed
fact must be grouped in a "Cube". For example, the measures "Exchange acidity", "Humus" and
"Fraction of 0.01" characterize the fact "Physical and chemical properties of the soil sample".
Dimensions have the hierarchical structure and combine several levels of analysis (Hierarchy). For
example, the "Time" dimension includes hierarchy levels such as "Date", "Day of the week",
"Month", "Quarter", "Year", "Season", and so on. Within the same schema, cubes and dimensions
are linked by an association relationship ("CubeDimensionAssociation"). This takes into account
the different levels of the hierarchy. The dimension values are designated as a separate class
("MemberSelection").
      </p>
      <p>Fig. 2. The metamodel of the analytical model of CWM specification.</p>
      <p>The construction of a multidimensional analytical model for a model-driven system provides
the ability to process the results of analytical queries in BI-systems (Business Intelligence Systems)
that supports information exchange in accordance with the CWM specification. The actual task is to
develop an algorithm for formatting the multidimensional model based on the properties of the
control model – an original platform metamodel.</p>
      <p>Algorithm of formatting the analytical model. To describe the algorithm, we use the above
formal description of the metamodel of the control model of the specialized model-driven system in
terms of set theory.</p>
      <p>The algorithm (Fig. 3) consists in sequential consideration of instances of the class "Object"
of the control model (∀n∈N) and their attributes (∀f∈F | fθn). In the theory of multidimensional
modeling, attributes are divided into "non-aggregated" (dimensional) and "aggregated"
(nondimensional). "Non-aggregated" attributes participate in the formation of the "dimension". And
"aggregated" attributes of one object (or fields of one table, in the classical case) make up the
"cube". Dimensional attributes can also be aggregated because an aggregate function "count" can be
applied to them. The exception is dimensions that are functionally independent of other dimensions.
In this case, the "isMeasure" property of the analytic model dimension is set to "false". In the
algorithm, the standard check of attribute type is supplemented by the attribute name parsing
procedure. Despite the numeric type, an attribute is not aggregated if its name contains marker
words that indicates its "reference" character. The set of marker words can be expanded by an
accumulation of precedents.
The "Cube" of each considered object is added to the set of cubes of the analytical model C.
An associative relationship is established between the object n and the cube generated by it, written
as a relation σ. Similarly, a set of dimensions D of the analytical model is formed and an associative
relationship between the object and the dimension - δ is established. These relations are necessary
for the operation of the algorithm of forming associations between cubes and dimensions (Fig. 4).</p>
      <p>The relation χ describes the relationship between the objects of the control model. The
expression nχk, where n,k∈N, means that object n has an attribute associated with object k. For
example, the object "Biotesting of reference samples" is associated with the object "Biotest". This
means that to set the values of the results of soil sample biotesting, it is necessary to select the
specific type of "Biotest". In turn, "Biotest" is an independent object. The user can expand the set of
its attributes and add new values if necessary.</p>
      <p>When constructing the analytical model, the χ relation is treated as a foreign key or a relation
of functional dependence. Associative links between cubes and dimensions are created in two cases.
First, a σ(n) cube generated by some object n must be associated with a δ(n) dimension generated
by the same object. Second, the σ(n) must be associated with all dimensions generated by objects
directly related to the object n. As well as with dimensions generated by objects transitively related
by the χ relation to n (by the elements of the set χ*(n)). The set χ*(n) is a transitive closure of the
relation χ for object n.</p>
      <p>Fig. 4. Algorithm of formation of the relation of Association between</p>
      <p>dimensions and cubes of the analytical model</p>
      <p>Sequential application of two algorithms allows you to create the analytical model consisting
of dimensions, cubes and the relationship between them (CubeDimensionAssociation) – the A
relation. In this paper, we do not consider the problem of constructing dimension hierarchies,
leaving it for future research.</p>
      <p>Conclusion. The software platform for building model-driven systems has been developed by
the specialists of the IСM SB RAS to support the consolidation, storage and analytical processing
of research data. The technological and methodological basis of the platform has allowed to create a
System of research of the state of the soil cover by the biophysics-researchers themselves without
the involvement of IT-specialists. The platform enables consistent storage of research data, analysis
within the framework of long-term multi-stage research projects and comparison of research results
carried out by different groups of scientists. In order to increase the availability of analytical
processing, in software tools for the analytical querying, the task of developing the algorithm for the
formation of the analytical model of the specialized system based on the control model has been set
and solved. The presented algorithm is the theoretical basis for the development of analytical
reports wizard for the specialized model-driven systems. The next step in creating a native tool for
analyzing research data is an algorithm for generating SQL queries to the system database in
accordance with user analytical queries in terms of the analytical model.</p>
      <p>The study was carried out with the financial support of RFBR and the Government of
Krasnoyarsk region, research project №18-47-240005.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] [2] [3] [4] [5] [6] [7] [8] [9]</source>
          [10]
          <string-name>
            <surname>LITERATURE K. W. Broman</surname>
            and
            <given-names>K. H.</given-names>
          </string-name>
          <string-name>
            <surname>Woo</surname>
          </string-name>
          , “Data Organization in Spreadsheets,” Am. Stat.,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Panko</surname>
          </string-name>
          , “What We Know About Spreadsheet Errors,
          <string-name>
            <given-names>” J.</given-names>
            <surname>Organ</surname>
          </string-name>
          .
          <article-title>End User Comput</article-title>
          .,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Korobko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korobko</surname>
          </string-name>
          , and E. Kolosova, “
          <article-title>Constructing the model-driven system for scientific researches support on the original software platform for primary data consolidation,” in Surveying Geology &amp; Mining Ecology Management (SGEM</article-title>
          ),
          <year>2018</year>
          , vol.
          <volume>18</volume>
          , no.
          <issue>2</issue>
          .
          <issue>1</issue>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Alpar</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schulz</surname>
          </string-name>
          , “
          <string-name>
            <surname>Self-Service Business</surname>
            <given-names>Intelligence</given-names>
          </string-name>
          ,” Bus. Inf. Syst. Eng., vol.
          <volume>58</volume>
          , no.
          <issue>2</issue>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>D. AnHai</surname>
            , H. Alon,
            <given-names>and I. Zachary</given-names>
          </string-name>
          ,
          <source>Principles of Data Integration. Elsevier</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Seidewitz</surname>
          </string-name>
          , “
          <article-title>What models mean,” IEEE Softw.</article-title>
          , vol.
          <volume>20</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Atkinson</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kühne</surname>
          </string-name>
          , “
          <article-title>Model-driven development: A metamodeling foundation</article-title>
          ,” IEEE Softw., vol.
          <volume>20</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Korobko</surname>
          </string-name>
          , “
          <article-title>Algorithm of Interface Generation for Model-Driven Data Consolidation System,”</article-title>
          <source>in 2018 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC)</source>
          ,
          <year>2018</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Peyton</surname>
          </string-name>
          , Common Warehouse Metamodel.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Golfarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          , “
          <article-title>the Dimensional Fact Model: a Conceptual Model for Data Warehouses,”</article-title>
          <string-name>
            <given-names>Int. J.</given-names>
            <surname>Coop</surname>
          </string-name>
          . Inf. Syst., vol.
          <volume>07</volume>
          , no.
          <issue>02n03</issue>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>247</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>