=Paper=
{{Paper
|id=Vol-2022/paper34
|storemode=property
|title=
On an Approach to Data Integration: Concept, Formal Foundations and Data Model
|pdfUrl=https://ceur-ws.org/Vol-2022/paper34.pdf
|volume=Vol-2022
|authors=Manuk G. Manukyan
|dblpUrl=https://dblp.org/rec/conf/rcdl/Manukyan17
}}
==
On an Approach to Data Integration: Concept, Formal Foundations and Data Model
==
On an Approach to Data Integration: Concept, Formal
Foundations and Data Model
© Manuk G. Manukyan
Yerevan State University,
Yerevan, Armenia
mgm@ysu.am
Abstract. In the frame of an extensible canonical data model a formalization of data integration con-
cept is proposed. We provide virtual and materialized integration of data as well as the possibility to support
data cubes with hierarchical dimensions. The considered approach of formalization of data integration con-
cept is based on the so-called content dictionaries. Namely, by means of these dictionaries we are formally
defining basic concepts of database theory, metadata about these concepts, and the data integration concept.
A computationally complete language is used to extract data from several sources, to create the materialized
view, and to effectively organize queries on the multidimensional data.
In memory of Garush Manukyan, my father.
This work was supported by the RA MES State Committee of Science, in the frames of the research
project N 15T-18350.
Keywords: data integration, mediator, data warehouse, data cube, canonical data model, OPENMath,
grid file, XML.
which is based on the grid files [18] concept. We con-
1 Introduction sider the concept of grid files as one of the adequate
The emergence of a new paradigm in science and formalisms for effective management of big data. Effi-
various applications of information technology (IT) are cient algorithms for storage and access of that directory
related to issues of big data handling [21]. The concept are proposed in order to minimize memory usage and
of big data is relatively new and involves the growing lookup operations complexities. Estimations of com-
role of data in all areas of human activity beginning plexities for these algorithms are presented. In fact, the
with research and ending with innovative developments concept of grid files allows to effectively organize que-
in business. Such data is difficult to process and analyze ries on multidimensional data [5] and can be used for
using conventional database technologies. In this con- efficient data cubes storage in data warehouses [13,19].
nection, the creation of new IT is expected in which A prototype to support the considered dynamic indexa-
data becomes dominant for new approaches to concep- tion scheme has been created and its performance was
tualization, organization, and implementation of sys- compared with one of the most demanded NoSQL data-
tems to solve problems that were previously considered bases [17].
extremely hard or, in some cases, impossible to solve. In this paper a formalization of the data integration
Unprecedented scale of development in the big data concept is proposed using a mechanism of the content
area and the U.S. and European programs related to big dictionaries (similarly ontologies) of the OPENMath
data underscore the importance of this trend in IT. [4]. Subjects of the formalization are the basic concepts
In the above discussed context the problems of da- of database theory, metadata about these concepts and
ta integration are very actual. Within of our approach to the data integration concept. The result of the formaliza-
data integration an extensible canonical model has been tion are a set of content dictionaries, constructed as
developed [16]. We have published a number of papers XML DTDs on the base of OPENMath and are used to
that are devoted to the investigation of data virtual and model the databases concepts. With this approach,
materialized data integration problems, for instance [15, schema of an integrated database is an instance of con-
17]. Our approach to data integration is based on the tent dictionary of the data integration concept. Within
works of the SYNTHESIS group (IPI RAS) [2, 9–12, the considered approach is provided virtual and materi-
22–25], who are pioneers in the area of justifiable data alized integration of data as well as the possibility to
models mapping for heterogeneous databases integra- support data cubes with hierarchical dimensions. Using
tion. To support materialized integration of data during OPENMath as the kernel of the canonical data model
creation of a data warehouse a new dynamic index allows us to use a rich apparatus of computational
structure for multidimensional data was proposed [6] mathematics for data analysis and management.
The paper is organized as follows: Concept and
formal foundations of the considered approach to data
Proceedings of the XIX International Conference integration are presented briefly in Section 2. Canonical
“Data Analytics and Management in Data Intensive data model and issues to support the data integration
Domains” (DAMDID/RCDL’2017), Moscow, Russia,
October 10–13, 2017
206
concept are considered in Section 3. The conclusion is on the works of the SYNTHESIS group. According to
provided in Section 4. the research of this group, each data model is defined by
syntax and semantics of two languages, data definition
2 Brief Discussion on Data Integration Ap- language (DDL) and data manipulation language
proach (DML). They suggested the following principles of syn-
The basis of our concept to data integration is based thesis of the canonical model:
on the idea of integrating arbitrary data models. Based • Principle of axiomatic extension of data models
on this assumption our concept of data integration as- The canonical data model must be extensible. The
sumes: kernel of the canonical model is fixed. Kernel extension
• applying extensible canonical model; is defined axiomatically. The extension of the canonical
• constructing justifiable data models mapping data model is formed during the consideration of each
for heterogeneous databases integration; new data model by adding new axioms to its DDL to
• using content dictionaries. define logical data dependencies of the source model in
terms of the target model if necessary. The results of the
Choosing the extensible canonical model as integra-
extension should be equivalent to the source data mod-
tion model allows integrating arbitrary data sources. As
el.
we allow integration of arbitrary data sources a necessi-
ty to check mapping correctness between data models • Principle of commutative mappings of data
arises. It is reached by formalization of data model con- models
cepts by means of AMN machines [1] and using B- The main principle of mapping of an arbitrary re-
technology to prove correctness of these mappings. source data model into the target one (the canonical
The content dictionaries are central to our concept model) could be reached under the condition that the
of data integration and semantical information of differ- diagram of DDL (schemas) mapping and the diagram of
ent types can be defined based on these dictionaries. DML (operators) mapping are commutative.
The concept of content dictionaries allows us to extend semantic
the canonical model by means of introducing new con- function DB_CM
SCH_CM
cepts in these dictionaries easily. In other words, canon-
ical model extension only is reduced to adding new
concepts and metadata about these concepts in content
dictionaries. Our concept to data integration is oriented
mapping
mapping
bijective
as virtual and materialized integration of data as well as
to support data cubes with hierarchical dimensions. It is
important that in all cases we use the same data model.
The considered data model is an advanced XML data
model which is a more flexible data model than rela- SCH_SM semantic DB_SM
tional or object-oriented data models. Among XML function
data models, a distinctive feature of our model is that
we use a computationally complete language for data Figure 1 DDL mapping diagram
definition. An important feature of our concept is the
In Figure 1 we used the following notations:
support of data warehouses on the base of a new dy-
SCH_CM: Set of schemas of the canonical data model;
namic indexing scheme for multidimensional data. A
SCH_SM: Set of schemas of the source data model;
new index structure developed by us allows to organize
DB_CM: Database of the canonical data model; DB_SM:
effectively OLAP-queries on multidimensional data and
Database of the source model.
can be used for efficient data cubes storage in data
warehouses. Finally, the modern trends of the develop- semantic
OP_CM function DB_CM DB_CM
ment of database systems lead to use of different divi-
sions of mathematics to data analysis. Within of our
concept to data integration, this leads to the use of cor-
algorithmic
responding content dictionaries of the OPENMath.
refinement
mapping
2.1 Formal Foundations
The above discussed concept to data integration is
based on the following formalisms:
P_SM semantic DB_SM DB_SM
• canonical data model; function
• OPENMath objects;
• multidimensional indexes; Figure 2 DML mapping diagram
• domain element calculus.
Below we will consider these formalisms in detail. In Figure 2 we used the following notations: OP_CM:
As we noted, our approach to data integration is based Set of operators of the canonical data model; P_SM: Set
207
of procedures in DML of the source model. used to assign formal and informal semantics to all
symbols used in the OPENMath objects. A content dic-
• Principle of synthesis of unified canonical data tionary is a collection of related symbols encoded in
model XML format. In other words, each content dictionary
The canonical data model is synthesized as a union defines symbols representing a concept from the specif-
of extensions. ic subject domain.
attr
book type app
sequence app attr
OneOrMore attr title type string
Figure 3 Canonical data model
2.2 Mathematical Objects Representation author type string
Figure 4 An example of compound object
The OpenMath is a standard for representation of
the mathematical objects, allowing them to be ex-
2.3 Dynamic Indexing Scheme for Multidimensional
changed between computer programs, stored in data-
Data
bases, or published on the Web. The considered formal-
ism is oriented to represent semantic information and is To support the materialized integration of data dur-
not intended to be used directly for presentation. Any ing the creation of a data warehouse and to apply very
mathematical concept or fact is an example of mathe- complex OLAP-queries on it a new dynamic index
matical object. The OpenMath objects are such repre- structure for multidimensional data was developed (see
sentation of mathematical objects which assume an more details in [6]). The considered index structure is
XML interpretation. based on the grid file concept. The grid file can be rep-
Formally, an OpenMath object is a labeled tree resented as if the space of points is partitioned into an
whose leaves are the basic OpenMath objects. The imaginary grid. The grid lines parallel to axis of each
compound objects are defined in terms of binding and dimension divide the space into stripes. The number of
application of λ-calculus [8]. The type system is built grid lines in different dimensions may vary, and there
on the basis of types that are defined by themselves and may be different spacings between adjacent grid lines,
certain recursive rules, whereby the compound types are even between lines in the same dimension. Intersections
built from simpler types. To build compound types the of these stripes form cells which hold references to data
following type constructors are used: buckets containing records belonging to corresponding
space partitions.
• Attribution. If v is a basic object variable and t is a
typed object, then attribution (v, type t) is a typed The weaknesses of the grid file formalism concept
object. It denotes a variable with type t. are non-efficient memory usage by groups of cells re-
ferring to the same data buckets and the possibility of
• Abstraction. If v is a basic object variable and t, A having a large number of overflow blocks for each data
are typed objects, then binding (λ, attribution (v, buckets. In our approach, we made an attempt to elimi-
type t), A) is a typed object. nate these defects of the grid file. Firstly, we introduced
• Application. If F and A are typed objects, then ap- the concept of the chunk: set of cells whose correspond-
plication (F, A) is a typed object. ing records are stored in the same data bucket (repre-
The OPENMath is implemented as an XML applica- sented by single memory cells with one pointer to the
tion. Its syntax is defined by syntactical rules of XML, corresponding data buckets). Chunking technique is
its grammar is partially defined by its own DTD. Only used to solve the problem of empty cells in the grid file.
syntactical validity of the OPENMath objects represen-
tation can be provided on the DTD level. To check se-
mantics, in addition to general rules inherited by XML
applications, the considered application defines new
syntactical rules. This is achieved by means of introduc-
tion of content dictionaries. Content dictionaries are
208
Data 2.4 Element Calculus
buckets
Y
In the frame of our approach to data integration as
integration model we consider an advanced XML data
model. In fact, data model defines the query language
[5]. Based on this, to give declarative queries a new
w1 query language (domain element calculus) [14] was
developed. A query to XML - database is a formula in
w2 element calculus language. To specify formulas a vari-
u1 ant of the multisorted first order predicate logic lan-
X
u2 guage is used. Notice that element calculus is developed
u3 in the style of object calculus [10]. In addition, there is a
v1 v2 v3 possibility to give queries by means of λ-expressions.
Generally, we can combine the considered variants of
queries.
Grid Z
partitions 3 Extensible Canonical Data Model
Figure 5 An example of 3-dimensional grid file The canonical model kernel is an advanced XML
Secondly, we consider each stripe as a linear hash data model: a minor extension of the OPENMath to
table which allows increasing the number of buckets support the concept of databases. The main difference
more slowly (for each stripe, the average number of between our XML data model and analogous XML data
overflow blocks of chunks crossed by that stripe is less models (in particular, XML Schema) is that the concept
than one). By using this technique we essentially restrict of data types in our case is interpreted conventionally
the number of disk operations. (set of values, set of operations). More details about the
type system of the XML Schema can be found in [3]. A
Chunk Imaginary divisions data model concept formalized on the kernel level is
s referred to as kernel concept.
3.1 Kernel Concepts
Stripes
In the frame of canonical data model we distinguish
basic and compound concepts. Formally, a kernel con-
cept is a labeled tree whose leaves are basic kernel con-
cepts. Examples of basic kernel concepts are constants,
variables, and symbols (for instance, reserved words).
The compound concepts are defined in terms of binding
and application of λ-calculus. The type system is built
analogously to that in OPENMath.
Data ... 3.2 Extension Principle
buckets Overflow As we noted above the canonical data model must
blocks be extensible. The extension of the canonical model is
Figure 6 An example of 2-dimensional modified grid formed during the consideration of each new data mod-
file el by adding new concepts to its DDL to define logical
We perform comparison of directory size by our data dependencies of the source model in terms of the
approach with two techniques for grid file organization target model if necessary. Thus, the canonical model
proposed in [20]: MDH (multidimensional dynamic extension assumes defining new symbols. The exten-
hashing) and MEH (multidimensional extendible hash- sion result must be equivalent to the source data model.
ing). Directory sizes for both of these techniques are: To apply a symbol on the canonical model level the
𝟏𝟏 𝒏𝒏−𝟏𝟏 following rule has been proposed:
𝑶𝑶 �𝒓𝒓𝟏𝟏+𝒔𝒔 � and 𝑶𝑶 �𝒓𝒓𝟏𝟏+𝒏𝒏𝒏𝒏−𝟏𝟏 � correspondingly, where r is
Concept symbol ContextDefinition.
the total number of records, s is the block size and n is
For example, to support the concept of key of relational
the number of dimensions. In our case the directory data model, we have expanded the canonical model
𝑛𝑛𝑛𝑛
size can be estimated as 𝑂𝑂 � �. Compared to MDH with the symbol key. Let us consider a relational sche-
𝑠𝑠
and MEH techniques, the directory size in our approach ma example:
𝟏𝟏 𝒏𝒏−𝟏𝟏
𝒔𝒔𝒓𝒓 𝒔𝒔 𝒔𝒔𝒓𝒓𝒏𝒏𝒏𝒏−𝟏𝟏 S = {S#, Sname, Status, City}.
is and times smaller correspondingly. We
𝒏𝒏 𝒏𝒏 The equivalent definition of this schema by means
have implemented a data warehouse prototype based on of extended kernel is considered below:
the proposed dynamic indexation scheme and compared attribution (S, type TypeContext, constraint
its performance with MongoDB [26] (see in [17]). ConstraintContext)
209
TypeContext application (sequence, the kernel attribution concept and has an attribute name.
ApplicationContext) By means of this concept we can model schemas of
ApplicationContext attribution (S#, type int), databases. The value of attribute name is the DB's
attribution (Sname, type string), name. The content of element med is based on the ele-
attribution (Status, type int), ments msch, wrapper, constraint and has an attribute
attribution (City, type string)) name. The value of this attribute is the mediator's name.
ConstraintContext attribution (name, key S#). The element msch is interpreted analogously to element
It is essential that we use a computationally com- dbsch. Only note that this element is used during mod-
plete language to define the context [14]. As a result of elling schemas of a mediator. The content of elements
such approach, usage of new symbols in the DDL does wrapper and constraint is based on the kernel applica-
not lead to any changes in the DDL parser. tion concept. By means of wrapper element mappings
from source models into a canonical model are defined.
3.3 Semantic Level The integrity constraints on the level of mediator are the
values of the constraints elements. It is important that
The canonical model is an XML application. Only we are using a computationally complete language for
syntactical validity of the canonical model concepts defining the mappings and integrity constraints. Below,
representation can be provided on the DTD level. To an example of a mediator for an automobile company
check semantics the considered application defines new database is adduced [5] which is an instance of a con-
syntactical rules. We define these syntactical rules in tent dictionary of data integration concept. It is assumed
content dictionaries. that the mediator with schema AutosMed = {SerialNo,
Model, Color} is integrate two relational sources: Cars
3.4 Content Dictionaries = {SerialNo, Model, Color} and Autos = {Serial, Mod-
The content dictionary is the main formalism to de- el}, Colors = {Serial, Color}.
fine semantical information about concepts of the ca-
nonical data model. In other words, content dictionaries
are used to assign formal and informal semantics to all schema definition of Cars
concepts of the canonical data model. A content dic-
tionary is a collection of related symbols, encoded in
XML format and fixes the “meaning” of concepts inde-
pendently of the application. Three kinds of content schema definition of Autos
dictionaries are considered:
• content dictionaries to define basic concepts (sym- schema definition of Colors
bols);
• content dictionaries to define a signature of basic
concepts (mathematical symbols) to check the se-
mantic validity of their representation; AutosMed: schema for mediator is defined
• content dictionary to define a data integration con-
cept.
Supporting the above considered content dictionar-
ies assumes to develop corresponding DTDs. Instances
of such DTDs are XML documents. An instance of a
DTD of a content dictionary of basic concepts is used to
assign formal and informal semantics of those objects.
Finally, an instance of a DTD of a content dictionary of
a signature of basic concepts contains metainformation
about these concepts, and an instance of a DTD of a
content dictionary of a data integration concept is a
metadata for integrating databases.
3.5 Data Integration Concept
In the frame of our approach to data integration we
It is essential that, we use a computationally com-
consider virtual as well as materialized data integration
plete language to model the mediator work.
issues within a canonical model. Therefore, we should
formalize the concepts of this subject area such as me- Data warehouse. As we noted above the considered
diator, data warehouse and data cube. We are model- approach to support data warehousing is based on the
ling these concepts by means of the following XML grid file concept and is interpreted by means of element
elements: dbsch, med, whse and cube. whse. This element is defined as kernel application
concept and is based on the elements wsch, extractor,
Mediator. The content of element dbsch is based on
grid and has an attribute name. The value of this attrib-
210
ute is the name of the data warehouse. The element described by means of attribute name. Value of attribute
wsch is interpreted in the same way as the element msch name is the dimension name. The creation of the data
for the mediator. The element extractor is defined as cube requires generation of the power set (set of all sub-
kernel application concept and is used to extract data set) of the aggregation attributes. To implement the
from source databases. The element grid is defined as formal data cube concept in literature the CUBE opera-
kernel application concept and is based on the elements tor is considered [7]. In addition to the CUBE operator
dim and chunk by which the grid file concept is mod- in [7] the operator ROLLUP is produced as a special
elled. To model the concept of stripe of a grid file we variety of the CUBE operator which produces the addi-
introduced an empty element stripe which is described tional aggregated information only if they aggregate
by means of five attributes: ref_to_chunk, min_val, over a tail of the sequence of grouping attributes. To
max_val, rec_cnt and chunk_cnt. The values of attrib- support these operators we introduced cube and rollup
utes ref_to_chunk are pointers to chunks crossed by symbols correspondingly. In this context, it is assumed
each stripe. By means of min_val (lower boundary) and that all independent attributes are grouping attributes.
max_val (upper boundary) attributes we define "widths" For some dimensions there are many degrees of granu-
of the stripes. The values of attributes rec_cnt and larity that could be chosen for a grouping on that di-
chunk_cnt are the total number of records in a stripe and mension. When the number of choices for grouping
the number of chunks that are crossed by it correspond- along each dimension grows, it becomes non-effective
ingly. To model the concept chunk we introduced an to store the results of aggregating based on all the sub-
element chunk which is based on the empty element avg sets of groupings. Thus, it becomes reasonable to intro-
and is described by means of four attributes: id of type duce materialized views.
ID, qty, ref_to_db and ref_to_chunk. Values of attrib- All
utes ref_to_db and ref_to_chunk are pointers to data All
blocks and other chunks, correspondingly. Value of
Years
attribute qty is the number of different points of the
considered chunk for fixed dimension. Element avg is
State
described by means of two attributes: value and dim.
Values of value attributes are used during reorganiza- Quarters
tion of the grid file and contain the average coordinates
of points, corresponding to records of the considered City
chunk, for each dimension. Value of attribute dim is the
name of the corresponding dimension. To model the Weeks Months
concept of dimension of a grid file we introduced an
element dim which is based on the empty element stripe Days Dealer
and has a single attribute name: i. e. the dimension
name. Figure 7 Examples of lattices partitions for time inter-
vals and automobile dealers
Data cube. Materialized integration of data assumes
the creation of data warehouses. Our approach to create Materialized views. A materialized view is the result
data warehouses is mainly oriented to support data cu- of some query which is stored in the database, and
bes. Using data warehousing technologies in OLAP which does not contain all aggregated values. To model
applications is very important [5]. Firstly, the data the materialized view concept we introduce an element
warehouse is a necessary tool to organize and centralize mview which is interpreted by means of an element
corporate information in order to support OLAP queries view, and the last is based on the kernel attribution con-
(source data are often distributed in heterogeneous cept. When implementing the query over hierarchical
sources). Secondly, significant is the fact that OLAP dimension, a problem to choose an effective material-
queries, which are very complex in nature and involve ized view arises. In other words, if we have aggregated
large amounts of data, require too much time to perform values regarding to granularity Months and Quarters
in a traditional transaction processing environment. To then for aggregation regarding to granularity on Years it
model the data cube concept we introduced an element will be effective to apply query over materialized view
cube which is interpreted by means of the following with granularity Quarters. As in [5], we also consider
elements: felement, delement, fcube, rollup, mview and the lattice (a partially ordered set) as a relevant con-
granularity. In typical OLAP applications, some collec- struction to formalize the hierarchical dimension. The
tion of data called fact_table which represent events or lattice nodes correspond to the units of the partitions of
objects of interest are used [5]. Usually, fact_table con- a dimension. In general, the set of partitions of a dimen-
tains several attributes representing dimensions, and one sion is a partially ordered set. We say that partition P1 is
or more dependent attributes that represent properties precedes partition P2, written P1 ≤ P2 if and only if there
for the point as a whole. To model the fact_table con- is a path from node P1 to node P2. Based on the lattices
cept we introduced an element felement which is based for each dimension we can define a lattice for all the
on the kernel attribution concept. To model the concept possible materialized views of a data cube which are
of dimension we introduced an element delement. This created by means of grouping according to some parti-
element is based on the empty element element which is tion in each dimension. Let V1 and V2 be views, then V1
≤ V2 if and only if for each dimension of V1 with parti-
211
tion P1 and analogous dimension of V2 with partition P2
holds P1 ≤ P2. Finally, let V be a view and Q be a query.
We can implement this query over the considered view
if and only if V ≤ Q. To model the concept of hierar-
chical dimension we introduced an element granularity
which is based on an empty element partition, and the
latter is described by means of attribute name. The val-
ue of attribute name is the name of the granularity. Be-
low, an example of data cube for an automobile compa-
ny database is adduced [5] which is an instance of con-
tent dictionary of data integration concept. We consider
pendent attribute: Value Set of partitions of dimension
schema definition of Sales
Figure 8 DTD for formalization of the data integration
concept
definition of materialized view Sales1
4 Conclusion
The data integration concept formalization prob-
definition of materialized view Sales2 lems were investigated. The outcome of this investiga-
tion is a definition language of integrable data, which is
based on the formalization of the data integration con-
cept using a mechanism of the content dictionaries of
the OPENMath. Supporting the concept of data integra-
tion is achieved by the creation of content dictionaries,
each of which contains formal definitions of concepts
of a specific area of databases.
The data integration concept is represented as a set
of XML DTDs which are based on the OPENMath for-
malism. By means of such DTDs were formalized the
basic concepts of database theory, metadata about these
concepts and the data integration concept. Within our
approach to data integration, an integrated schema is
The detailed discussion of the issues connected with
represented as an XML document which is an instance
applying the query language to integrated data is be-
of an XML DTD of the data integration concept. Thus,
yond the topic of this paper. Below the XML-
modelling of the integrated data based on the OPEN-
formalization of data integration concept is presented.
Math formalism leads to the creation of the correspond-
ing XML DTDs.
By means of the developed content dictionary of
the data integration concept we are modelling the medi-
ator and the data warehouse concepts. The considered
approach provides virtual and materialized integration
of data as well as the possibility to support data cubes
with hierarchical dimensions. Within our concept of
212
data cube, the operators CUBE and ROLLUP are im- Models. In Proc. of the 16th East European Con-
plemented. If necessary, in data integrated schemas new ference. LNCS 7503, pp. 223-239 (2012)
super-aggregate operators can be define. We use a com- [13] Luo, C., Hou, W. C., Wang, C. F., Want H., Yu,
putationally complete language to create schemas of X.: Grid File for Efficient Data Cube Storage.
integrated data. Applying the query language to the Computers and their Applications, pp. 424-429
integrated data is generated a reduction problem. Sup- (2006)
porting the query language over such data requires addi- [14] Manukyan, M. G.: Extensible Data Model. In
tional investigations. ADBIS’08, pp. 42-57 (2008)
Finally, modern trends of the development of data- [15] Manukyan, M. G., Gevorgyan, G. R.: An Ap-
base systems lead to the application of different divi- proach to Information Integration Based on the
sions of mathematics to data analysis and management. AMN Formalism. In First Workshop on Pro-
In the frame of our approach to data integration, this gramming the Semantic Web. Available:
leads to the use of corresponding content dictionaries of https://web.archive.org/web/20121226215425/http
the OPENMath. ://www.inf.puc-rio.br/~psw12/program.html,
pp. 1-13 (2012)
References
[16] Manukyan, M. G.: Canonical Data Model: Con-
[1] Abrial, J.-R.: The B-Book: Assigning programs to struction Principles. In iiWAS’14, pp. 320-329,
meaning. Cambridge University Press (1996) ACM (2014)
[2] Briukhov, D. O., Vovchenko, A. E., Zakha- [17] Manukyan, M. G., Gevorgyan, G. R.: Canonical
rov, V. N., Zhelenkova, O. P., Kalinichen- Data Model for Data Warehouse. In New Trends
ko, L. A., Martynov, D. O., Skvortsov, N. A., in Databases and Information Systems,
Stupnikov, S. A.: The Middleware Architecture of Communications in Computer and Information
the Subject Mediators for Problem Solving over a Science, 637, pp. 72-79 (2016)
Set of Integrated Heterogeneous Distributed In- [18] Nievergelt, J., Hinterberger, H.: The Grid File: An
formation Resources in the Hybrid Grid- Adaptable, Symmetric, Multikey File Structure.
Infrastructure of Virtual Observatories. Informat- ACM Transaction on Database Systems, 9 (1),
ics and Applications, 2 (1), pp. 2-34, (2008) pp. 38-71 (1984)
[3] Date, C. J.: An Introduction to Database Systems. [19] Papadopoulos, A. N., Manolopoulos, Y., The-
Addison Wesley, USA (2004) odoridis, Y., Tsoras, V.: Grid File (and family). In
[4] Drawar, M.: OpenMath: An overview. ACM SIG- Encyclopedia of Database Systems, pp. 1279-1282
SAM Bulletin, 34 (2), (2000) (2009)
[5] Garcia-Molina, H., Ullman, J., Widom, J.: Data- [20] Regnier, M.: Analysis of Grid File Algorithms,
base Systems: The Complete Book. Prentice Hall, BIT, 25 (2), pp. 335-358 (1985)
USA (2009) [21] Sharma, S., Tim, U. S., Wong, J., Gadia, S., Shar-
[6] Gevorgyan, G. R., Manukyan, M. G.: Effective ma, S.: A Brief Review on Leading Big Data
Algorithms to Support Grid Files. RAU Bulletin, Models. Data Science Journal, (13), pp. 138-157,
(2), pp. 22-38 (2015) (2014). Doi: http/doi.org/10.2481/dsj.14-041
[7] Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: [22] Stupnikov, S. A.: A Varifiable Mapping of a Mul-
Data Cube: A Relational Aggregation Operator tidimensional Array Data Model into an Object
Generalizing Group-By, Cross-Tab, and Sub-Tab. Data Model, Informatics and Applications, 7 (3),
In ICDE, pp. 152-159 (1996) pp. 22-34 (2013)
[8] Hindley, J. R., Seldin, J. P.: Introduction to Com- [23] Stupnikov, S. A, Vovchenko, A.: Combined Vir-
binators and λ-Calculus. Cambridge University tual and Materialized Environment for Integration
Press (1986) of Large Heterogeneous Data Collections. In Proc.
[9] Kalinichenko, L. A.: Methods and Tools for of the RCDL 2014. CEUR Workshop Proceedings,
Equivalent Data Model Mapping Construction. In 1297, pp. 339-348 (2014)
EDBT, pp. 92-119, Springer (1990) [24] Stupnikov, S. A, Miloslavskaya, N. G., Budz-
[10] Kalinichenko, L. A.: Integration of Heterogeneous ko, V.: Unification of Graph Data Models for Het-
Semistructured Data Models in the Canonical One. erogeneous Security Information Resources' Inte-
In RCDL, pp. 3-15 (1990) gration. In Proc. of the Int. Conf. on Open and Big
[11] Kalinichenko, L. A., Stupnikov, S. A.: Construct- Data OBD 2015 (joint with 3rd Int. Conf. on Fu-
ing of Mappings of Heterogeneous Information ture Internet of Things and Cloud, FiCloud 2015).
Models into the Canonical Models of Integrated IEEE 2015, pp. 457-464 (2015)
Information Systems. In Proc. of the 12th East- [25] Zakharov, V. N., Kalinichenko, L. A., Sokolov, I. A.,
European Conference, pp. 106-122 (2008) Stupnikov, S. A.: Development of Canonical Infor-
[12] Kalinichenko, L., Stupnikov, S.: Synthesis of the mation Models for Integrated Information Systems.
Canonical Models for Database Integration Pre- Informatics and Applications, 1 (2), pp. 15-38 (2007)
serving Semantics of the Value Inventive Data [26] MongoDB. https://www.mongodb.org
213