On an Approach to Data Integration: Concept, Formal Foundations and Data Model © Manuk G. Manukyan Yerevan State University, Yerevan, Armenia mgm@ysu.am Abstract. In the frame of an extensible canonical data model a formalization of data integration con- cept is proposed. We provide virtual and materialized integration of data as well as the possibility to support data cubes with hierarchical dimensions. The considered approach of formalization of data integration con- cept is based on the so-called content dictionaries. Namely, by means of these dictionaries we are formally defining basic concepts of database theory, metadata about these concepts, and the data integration concept. A computationally complete language is used to extract data from several sources, to create the materialized view, and to effectively organize queries on the multidimensional data. In memory of Garush Manukyan, my father. This work was supported by the RA MES State Committee of Science, in the frames of the research project N 15T-18350. Keywords: data integration, mediator, data warehouse, data cube, canonical data model, OPENMath, grid file, XML. which is based on the grid files [18] concept. We con- 1 Introduction sider the concept of grid files as one of the adequate The emergence of a new paradigm in science and formalisms for effective management of big data. Effi- various applications of information technology (IT) are cient algorithms for storage and access of that directory related to issues of big data handling [21]. The concept are proposed in order to minimize memory usage and of big data is relatively new and involves the growing lookup operations complexities. Estimations of com- role of data in all areas of human activity beginning plexities for these algorithms are presented. In fact, the with research and ending with innovative developments concept of grid files allows to effectively organize que- in business. Such data is difficult to process and analyze ries on multidimensional data [5] and can be used for using conventional database technologies. In this con- efficient data cubes storage in data warehouses [13,19]. nection, the creation of new IT is expected in which A prototype to support the considered dynamic indexa- data becomes dominant for new approaches to concep- tion scheme has been created and its performance was tualization, organization, and implementation of sys- compared with one of the most demanded NoSQL data- tems to solve problems that were previously considered bases [17]. extremely hard or, in some cases, impossible to solve. In this paper a formalization of the data integration Unprecedented scale of development in the big data concept is proposed using a mechanism of the content area and the U.S. and European programs related to big dictionaries (similarly ontologies) of the OPENMath data underscore the importance of this trend in IT. [4]. Subjects of the formalization are the basic concepts In the above discussed context the problems of da- of database theory, metadata about these concepts and ta integration are very actual. Within of our approach to the data integration concept. The result of the formaliza- data integration an extensible canonical model has been tion are a set of content dictionaries, constructed as developed [16]. We have published a number of papers XML DTDs on the base of OPENMath and are used to that are devoted to the investigation of data virtual and model the databases concepts. With this approach, materialized data integration problems, for instance [15, schema of an integrated database is an instance of con- 17]. Our approach to data integration is based on the tent dictionary of the data integration concept. Within works of the SYNTHESIS group (IPI RAS) [2, 9–12, the considered approach is provided virtual and materi- 22–25], who are pioneers in the area of justifiable data alized integration of data as well as the possibility to models mapping for heterogeneous databases integra- support data cubes with hierarchical dimensions. Using tion. To support materialized integration of data during OPENMath as the kernel of the canonical data model creation of a data warehouse a new dynamic index allows us to use a rich apparatus of computational structure for multidimensional data was proposed [6] mathematics for data analysis and management. The paper is organized as follows: Concept and formal foundations of the considered approach to data Proceedings of the XIX International Conference integration are presented briefly in Section 2. Canonical “Data Analytics and Management in Data Intensive data model and issues to support the data integration Domains” (DAMDID/RCDL’2017), Moscow, Russia, October 10–13, 2017 206 concept are considered in Section 3. The conclusion is on the works of the SYNTHESIS group. According to provided in Section 4. the research of this group, each data model is defined by syntax and semantics of two languages, data definition 2 Brief Discussion on Data Integration Ap- language (DDL) and data manipulation language proach (DML). They suggested the following principles of syn- The basis of our concept to data integration is based thesis of the canonical model: on the idea of integrating arbitrary data models. Based • Principle of axiomatic extension of data models on this assumption our concept of data integration as- The canonical data model must be extensible. The sumes: kernel of the canonical model is fixed. Kernel extension • applying extensible canonical model; is defined axiomatically. The extension of the canonical • constructing justifiable data models mapping data model is formed during the consideration of each for heterogeneous databases integration; new data model by adding new axioms to its DDL to • using content dictionaries. define logical data dependencies of the source model in terms of the target model if necessary. The results of the Choosing the extensible canonical model as integra- extension should be equivalent to the source data mod- tion model allows integrating arbitrary data sources. As el. we allow integration of arbitrary data sources a necessi- ty to check mapping correctness between data models • Principle of commutative mappings of data arises. It is reached by formalization of data model con- models cepts by means of AMN machines [1] and using B- The main principle of mapping of an arbitrary re- technology to prove correctness of these mappings. source data model into the target one (the canonical The content dictionaries are central to our concept model) could be reached under the condition that the of data integration and semantical information of differ- diagram of DDL (schemas) mapping and the diagram of ent types can be defined based on these dictionaries. DML (operators) mapping are commutative. The concept of content dictionaries allows us to extend semantic the canonical model by means of introducing new con- function DB_CM SCH_CM cepts in these dictionaries easily. In other words, canon- ical model extension only is reduced to adding new concepts and metadata about these concepts in content dictionaries. Our concept to data integration is oriented mapping mapping bijective as virtual and materialized integration of data as well as to support data cubes with hierarchical dimensions. It is important that in all cases we use the same data model. The considered data model is an advanced XML data model which is a more flexible data model than rela- SCH_SM semantic DB_SM tional or object-oriented data models. Among XML function data models, a distinctive feature of our model is that we use a computationally complete language for data Figure 1 DDL mapping diagram definition. An important feature of our concept is the In Figure 1 we used the following notations: support of data warehouses on the base of a new dy- SCH_CM: Set of schemas of the canonical data model; namic indexing scheme for multidimensional data. A SCH_SM: Set of schemas of the source data model; new index structure developed by us allows to organize DB_CM: Database of the canonical data model; DB_SM: effectively OLAP-queries on multidimensional data and Database of the source model. can be used for efficient data cubes storage in data warehouses. Finally, the modern trends of the develop- semantic OP_CM function DB_CM DB_CM ment of database systems lead to use of different divi- sions of mathematics to data analysis. Within of our concept to data integration, this leads to the use of cor- algorithmic responding content dictionaries of the OPENMath. refinement mapping 2.1 Formal Foundations The above discussed concept to data integration is based on the following formalisms: P_SM semantic DB_SM DB_SM • canonical data model; function • OPENMath objects; • multidimensional indexes; Figure 2 DML mapping diagram • domain element calculus. Below we will consider these formalisms in detail. In Figure 2 we used the following notations: OP_CM: As we noted, our approach to data integration is based Set of operators of the canonical data model; P_SM: Set 207 of procedures in DML of the source model. used to assign formal and informal semantics to all symbols used in the OPENMath objects. A content dic- • Principle of synthesis of unified canonical data tionary is a collection of related symbols encoded in model XML format. In other words, each content dictionary The canonical data model is synthesized as a union defines symbols representing a concept from the specif- of extensions. ic subject domain. attr book type app sequence app attr OneOrMore attr title type string Figure 3 Canonical data model 2.2 Mathematical Objects Representation author type string Figure 4 An example of compound object The OpenMath is a standard for representation of the mathematical objects, allowing them to be ex- 2.3 Dynamic Indexing Scheme for Multidimensional changed between computer programs, stored in data- Data bases, or published on the Web. The considered formal- ism is oriented to represent semantic information and is To support the materialized integration of data dur- not intended to be used directly for presentation. Any ing the creation of a data warehouse and to apply very mathematical concept or fact is an example of mathe- complex OLAP-queries on it a new dynamic index matical object. The OpenMath objects are such repre- structure for multidimensional data was developed (see sentation of mathematical objects which assume an more details in [6]). The considered index structure is XML interpretation. based on the grid file concept. The grid file can be rep- Formally, an OpenMath object is a labeled tree resented as if the space of points is partitioned into an whose leaves are the basic OpenMath objects. The imaginary grid. The grid lines parallel to axis of each compound objects are defined in terms of binding and dimension divide the space into stripes. The number of application of λ-calculus [8]. The type system is built grid lines in different dimensions may vary, and there on the basis of types that are defined by themselves and may be different spacings between adjacent grid lines, certain recursive rules, whereby the compound types are even between lines in the same dimension. Intersections built from simpler types. To build compound types the of these stripes form cells which hold references to data following type constructors are used: buckets containing records belonging to corresponding space partitions. • Attribution. If v is a basic object variable and t is a typed object, then attribution (v, type t) is a typed The weaknesses of the grid file formalism concept object. It denotes a variable with type t. are non-efficient memory usage by groups of cells re- ferring to the same data buckets and the possibility of • Abstraction. If v is a basic object variable and t, A having a large number of overflow blocks for each data are typed objects, then binding (λ, attribution (v, buckets. In our approach, we made an attempt to elimi- type t), A) is a typed object. nate these defects of the grid file. Firstly, we introduced • Application. If F and A are typed objects, then ap- the concept of the chunk: set of cells whose correspond- plication (F, A) is a typed object. ing records are stored in the same data bucket (repre- The OPENMath is implemented as an XML applica- sented by single memory cells with one pointer to the tion. Its syntax is defined by syntactical rules of XML, corresponding data buckets). Chunking technique is its grammar is partially defined by its own DTD. Only used to solve the problem of empty cells in the grid file. syntactical validity of the OPENMath objects represen- tation can be provided on the DTD level. To check se- mantics, in addition to general rules inherited by XML applications, the considered application defines new syntactical rules. This is achieved by means of introduc- tion of content dictionaries. Content dictionaries are 208 Data 2.4 Element Calculus buckets Y In the frame of our approach to data integration as integration model we consider an advanced XML data model. In fact, data model defines the query language [5]. Based on this, to give declarative queries a new w1 query language (domain element calculus) [14] was developed. A query to XML - database is a formula in w2 element calculus language. To specify formulas a vari- u1 ant of the multisorted first order predicate logic lan- X u2 guage is used. Notice that element calculus is developed u3 in the style of object calculus [10]. In addition, there is a v1 v2 v3 possibility to give queries by means of λ-expressions. Generally, we can combine the considered variants of queries. Grid Z partitions 3 Extensible Canonical Data Model Figure 5 An example of 3-dimensional grid file The canonical model kernel is an advanced XML Secondly, we consider each stripe as a linear hash data model: a minor extension of the OPENMath to table which allows increasing the number of buckets support the concept of databases. The main difference more slowly (for each stripe, the average number of between our XML data model and analogous XML data overflow blocks of chunks crossed by that stripe is less models (in particular, XML Schema) is that the concept than one). By using this technique we essentially restrict of data types in our case is interpreted conventionally the number of disk operations. (set of values, set of operations). More details about the type system of the XML Schema can be found in [3]. A Chunk Imaginary divisions data model concept formalized on the kernel level is s referred to as kernel concept. 3.1 Kernel Concepts Stripes In the frame of canonical data model we distinguish basic and compound concepts. Formally, a kernel con- cept is a labeled tree whose leaves are basic kernel con- cepts. Examples of basic kernel concepts are constants, variables, and symbols (for instance, reserved words). The compound concepts are defined in terms of binding and application of λ-calculus. The type system is built analogously to that in OPENMath. Data ... 3.2 Extension Principle buckets Overflow As we noted above the canonical data model must blocks be extensible. The extension of the canonical model is Figure 6 An example of 2-dimensional modified grid formed during the consideration of each new data mod- file el by adding new concepts to its DDL to define logical We perform comparison of directory size by our data dependencies of the source model in terms of the approach with two techniques for grid file organization target model if necessary. Thus, the canonical model proposed in [20]: MDH (multidimensional dynamic extension assumes defining new symbols. The exten- hashing) and MEH (multidimensional extendible hash- sion result must be equivalent to the source data model. ing). Directory sizes for both of these techniques are: To apply a symbol on the canonical model level the 𝟏𝟏 𝒏𝒏−𝟏𝟏 following rule has been proposed: 𝑶𝑶 �𝒓𝒓𝟏𝟏+𝒔𝒔 � and 𝑶𝑶 �𝒓𝒓𝟏𝟏+𝒏𝒏𝒏𝒏−𝟏𝟏 � correspondingly, where r is Concept symbol ContextDefinition. the total number of records, s is the block size and n is For example, to support the concept of key of relational the number of dimensions. In our case the directory data model, we have expanded the canonical model 𝑛𝑛𝑛𝑛 size can be estimated as 𝑂𝑂 � �. Compared to MDH with the symbol key. Let us consider a relational sche- 𝑠𝑠 and MEH techniques, the directory size in our approach ma example: 𝟏𝟏 𝒏𝒏−𝟏𝟏 𝒔𝒔𝒓𝒓 𝒔𝒔 𝒔𝒔𝒓𝒓𝒏𝒏𝒏𝒏−𝟏𝟏 S = {S#, Sname, Status, City}. is and times smaller correspondingly. We 𝒏𝒏 𝒏𝒏 The equivalent definition of this schema by means have implemented a data warehouse prototype based on of extended kernel is considered below: the proposed dynamic indexation scheme and compared attribution (S, type TypeContext, constraint its performance with MongoDB [26] (see in [17]). ConstraintContext) 209 TypeContext application (sequence, the kernel attribution concept and has an attribute name. ApplicationContext) By means of this concept we can model schemas of ApplicationContext attribution (S#, type int), databases. The value of attribute name is the DB's attribution (Sname, type string), name. The content of element med is based on the ele- attribution (Status, type int), ments msch, wrapper, constraint and has an attribute attribution (City, type string)) name. The value of this attribute is the mediator's name. ConstraintContext attribution (name, key S#). The element msch is interpreted analogously to element It is essential that we use a computationally com- dbsch. Only note that this element is used during mod- plete language to define the context [14]. As a result of elling schemas of a mediator. The content of elements such approach, usage of new symbols in the DDL does wrapper and constraint is based on the kernel applica- not lead to any changes in the DDL parser. tion concept. By means of wrapper element mappings from source models into a canonical model are defined. 3.3 Semantic Level The integrity constraints on the level of mediator are the values of the constraints elements. It is important that The canonical model is an XML application. Only we are using a computationally complete language for syntactical validity of the canonical model concepts defining the mappings and integrity constraints. Below, representation can be provided on the DTD level. To an example of a mediator for an automobile company check semantics the considered application defines new database is adduced [5] which is an instance of a con- syntactical rules. We define these syntactical rules in tent dictionary of data integration concept. It is assumed content dictionaries. that the mediator with schema AutosMed = {SerialNo, Model, Color} is integrate two relational sources: Cars 3.4 Content Dictionaries = {SerialNo, Model, Color} and Autos = {Serial, Mod- The content dictionary is the main formalism to de- el}, Colors = {Serial, Color}. fine semantical information about concepts of the ca- nonical data model. In other words, content dictionaries are used to assign formal and informal semantics to all schema definition of Cars concepts of the canonical data model. A content dic- tionary is a collection of related symbols, encoded in XML format and fixes the “meaning” of concepts inde- pendently of the application. Three kinds of content schema definition of Autos dictionaries are considered: • content dictionaries to define basic concepts (sym- schema definition of Colors bols); • content dictionaries to define a signature of basic concepts (mathematical symbols) to check the se- mantic validity of their representation; AutosMed: schema for mediator is defined • content dictionary to define a data integration con- cept. Supporting the above considered content dictionar- ies assumes to develop corresponding DTDs. Instances of such DTDs are XML documents. An instance of a DTD of a content dictionary of basic concepts is used to assign formal and informal semantics of those objects. Finally, an instance of a DTD of a content dictionary of a signature of basic concepts contains metainformation about these concepts, and an instance of a DTD of a content dictionary of a data integration concept is a metadata for integrating databases. 3.5 Data Integration Concept In the frame of our approach to data integration we It is essential that, we use a computationally com- consider virtual as well as materialized data integration plete language to model the mediator work. issues within a canonical model. Therefore, we should formalize the concepts of this subject area such as me- Data warehouse. As we noted above the considered diator, data warehouse and data cube. We are model- approach to support data warehousing is based on the ling these concepts by means of the following XML grid file concept and is interpreted by means of element elements: dbsch, med, whse and cube. whse. This element is defined as kernel application concept and is based on the elements wsch, extractor, Mediator. The content of element dbsch is based on grid and has an attribute name. The value of this attrib- 210 ute is the name of the data warehouse. The element described by means of attribute name. Value of attribute wsch is interpreted in the same way as the element msch name is the dimension name. The creation of the data for the mediator. The element extractor is defined as cube requires generation of the power set (set of all sub- kernel application concept and is used to extract data set) of the aggregation attributes. To implement the from source databases. The element grid is defined as formal data cube concept in literature the CUBE opera- kernel application concept and is based on the elements tor is considered [7]. In addition to the CUBE operator dim and chunk by which the grid file concept is mod- in [7] the operator ROLLUP is produced as a special elled. To model the concept of stripe of a grid file we variety of the CUBE operator which produces the addi- introduced an empty element stripe which is described tional aggregated information only if they aggregate by means of five attributes: ref_to_chunk, min_val, over a tail of the sequence of grouping attributes. To max_val, rec_cnt and chunk_cnt. The values of attrib- support these operators we introduced cube and rollup utes ref_to_chunk are pointers to chunks crossed by symbols correspondingly. In this context, it is assumed each stripe. By means of min_val (lower boundary) and that all independent attributes are grouping attributes. max_val (upper boundary) attributes we define "widths" For some dimensions there are many degrees of granu- of the stripes. The values of attributes rec_cnt and larity that could be chosen for a grouping on that di- chunk_cnt are the total number of records in a stripe and mension. When the number of choices for grouping the number of chunks that are crossed by it correspond- along each dimension grows, it becomes non-effective ingly. To model the concept chunk we introduced an to store the results of aggregating based on all the sub- element chunk which is based on the empty element avg sets of groupings. Thus, it becomes reasonable to intro- and is described by means of four attributes: id of type duce materialized views. ID, qty, ref_to_db and ref_to_chunk. Values of attrib- All utes ref_to_db and ref_to_chunk are pointers to data All blocks and other chunks, correspondingly. Value of Years attribute qty is the number of different points of the considered chunk for fixed dimension. Element avg is State described by means of two attributes: value and dim. Values of value attributes are used during reorganiza- Quarters tion of the grid file and contain the average coordinates of points, corresponding to records of the considered City chunk, for each dimension. Value of attribute dim is the name of the corresponding dimension. To model the Weeks Months concept of dimension of a grid file we introduced an element dim which is based on the empty element stripe Days Dealer and has a single attribute name: i. e. the dimension name. Figure 7 Examples of lattices partitions for time inter- vals and automobile dealers Data cube. Materialized integration of data assumes the creation of data warehouses. Our approach to create Materialized views. A materialized view is the result data warehouses is mainly oriented to support data cu- of some query which is stored in the database, and bes. Using data warehousing technologies in OLAP which does not contain all aggregated values. To model applications is very important [5]. Firstly, the data the materialized view concept we introduce an element warehouse is a necessary tool to organize and centralize mview which is interpreted by means of an element corporate information in order to support OLAP queries view, and the last is based on the kernel attribution con- (source data are often distributed in heterogeneous cept. When implementing the query over hierarchical sources). Secondly, significant is the fact that OLAP dimension, a problem to choose an effective material- queries, which are very complex in nature and involve ized view arises. In other words, if we have aggregated large amounts of data, require too much time to perform values regarding to granularity Months and Quarters in a traditional transaction processing environment. To then for aggregation regarding to granularity on Years it model the data cube concept we introduced an element will be effective to apply query over materialized view cube which is interpreted by means of the following with granularity Quarters. As in [5], we also consider elements: felement, delement, fcube, rollup, mview and the lattice (a partially ordered set) as a relevant con- granularity. In typical OLAP applications, some collec- struction to formalize the hierarchical dimension. The tion of data called fact_table which represent events or lattice nodes correspond to the units of the partitions of objects of interest are used [5]. Usually, fact_table con- a dimension. In general, the set of partitions of a dimen- tains several attributes representing dimensions, and one sion is a partially ordered set. We say that partition P1 is or more dependent attributes that represent properties precedes partition P2, written P1 ≤ P2 if and only if there for the point as a whole. To model the fact_table con- is a path from node P1 to node P2. Based on the lattices cept we introduced an element felement which is based for each dimension we can define a lattice for all the on the kernel attribution concept. To model the concept possible materialized views of a data cube which are of dimension we introduced an element delement. This created by means of grouping according to some parti- element is based on the empty element element which is tion in each dimension. Let V1 and V2 be views, then V1 ≤ V2 if and only if for each dimension of V1 with parti- 211 tion P1 and analogous dimension of V2 with partition P2 holds P1 ≤ P2. Finally, let V be a view and Q be a query. We can implement this query over the considered view if and only if V ≤ Q. To model the concept of hierar- chical dimension we introduced an element granularity which is based on an empty element partition, and the latter is described by means of attribute name. The val- ue of attribute name is the name of the granularity. Be- low, an example of data cube for an automobile compa- ny database is adduced [5] which is an instance of con- tent dictionary of data integration concept. We consider pendent attribute: Value Set of partitions of dimension schema definition of Sales Figure 8 DTD for formalization of the data integration concept definition of materialized view Sales1 4 Conclusion The data integration concept formalization prob- definition of materialized view Sales2 lems were investigated. The outcome of this investiga- tion is a definition language of integrable data, which is based on the formalization of the data integration con- cept using a mechanism of the content dictionaries of the OPENMath. Supporting the concept of data integra- tion is achieved by the creation of content dictionaries, each of which contains formal definitions of concepts of a specific area of databases. The data integration concept is represented as a set of XML DTDs which are based on the OPENMath for- malism. By means of such DTDs were formalized the basic concepts of database theory, metadata about these concepts and the data integration concept. Within our approach to data integration, an integrated schema is The detailed discussion of the issues connected with represented as an XML document which is an instance applying the query language to integrated data is be- of an XML DTD of the data integration concept. Thus, yond the topic of this paper. Below the XML- modelling of the integrated data based on the OPEN- formalization of data integration concept is presented. Math formalism leads to the creation of the correspond- ing XML DTDs. By means of the developed content dictionary of the data integration concept we are modelling the medi- ator and the data warehouse concepts. The considered approach provides virtual and materialized integration of data as well as the possibility to support data cubes with hierarchical dimensions. Within our concept of 212 data cube, the operators CUBE and ROLLUP are im- Models. In Proc. of the 16th East European Con- plemented. If necessary, in data integrated schemas new ference. LNCS 7503, pp. 223-239 (2012) super-aggregate operators can be define. We use a com- [13] Luo, C., Hou, W. C., Wang, C. F., Want H., Yu, putationally complete language to create schemas of X.: Grid File for Efficient Data Cube Storage. integrated data. Applying the query language to the Computers and their Applications, pp. 424-429 integrated data is generated a reduction problem. Sup- (2006) porting the query language over such data requires addi- [14] Manukyan, M. G.: Extensible Data Model. In tional investigations. ADBIS’08, pp. 42-57 (2008) Finally, modern trends of the development of data- [15] Manukyan, M. G., Gevorgyan, G. R.: An Ap- base systems lead to the application of different divi- proach to Information Integration Based on the sions of mathematics to data analysis and management. AMN Formalism. In First Workshop on Pro- In the frame of our approach to data integration, this gramming the Semantic Web. Available: leads to the use of corresponding content dictionaries of https://web.archive.org/web/20121226215425/http the OPENMath. ://www.inf.puc-rio.br/~psw12/program.html, pp. 1-13 (2012) References [16] Manukyan, M. G.: Canonical Data Model: Con- [1] Abrial, J.-R.: The B-Book: Assigning programs to struction Principles. In iiWAS’14, pp. 320-329, meaning. Cambridge University Press (1996) ACM (2014) [2] Briukhov, D. O., Vovchenko, A. E., Zakha- [17] Manukyan, M. G., Gevorgyan, G. R.: Canonical rov, V. N., Zhelenkova, O. P., Kalinichen- Data Model for Data Warehouse. In New Trends ko, L. A., Martynov, D. O., Skvortsov, N. A., in Databases and Information Systems, Stupnikov, S. A.: The Middleware Architecture of Communications in Computer and Information the Subject Mediators for Problem Solving over a Science, 637, pp. 72-79 (2016) Set of Integrated Heterogeneous Distributed In- [18] Nievergelt, J., Hinterberger, H.: The Grid File: An formation Resources in the Hybrid Grid- Adaptable, Symmetric, Multikey File Structure. Infrastructure of Virtual Observatories. Informat- ACM Transaction on Database Systems, 9 (1), ics and Applications, 2 (1), pp. 2-34, (2008) pp. 38-71 (1984) [3] Date, C. J.: An Introduction to Database Systems. [19] Papadopoulos, A. N., Manolopoulos, Y., The- Addison Wesley, USA (2004) odoridis, Y., Tsoras, V.: Grid File (and family). In [4] Drawar, M.: OpenMath: An overview. ACM SIG- Encyclopedia of Database Systems, pp. 1279-1282 SAM Bulletin, 34 (2), (2000) (2009) [5] Garcia-Molina, H., Ullman, J., Widom, J.: Data- [20] Regnier, M.: Analysis of Grid File Algorithms, base Systems: The Complete Book. Prentice Hall, BIT, 25 (2), pp. 335-358 (1985) USA (2009) [21] Sharma, S., Tim, U. S., Wong, J., Gadia, S., Shar- [6] Gevorgyan, G. R., Manukyan, M. G.: Effective ma, S.: A Brief Review on Leading Big Data Algorithms to Support Grid Files. RAU Bulletin, Models. Data Science Journal, (13), pp. 138-157, (2), pp. 22-38 (2015) (2014). Doi: http/doi.org/10.2481/dsj.14-041 [7] Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: [22] Stupnikov, S. A.: A Varifiable Mapping of a Mul- Data Cube: A Relational Aggregation Operator tidimensional Array Data Model into an Object Generalizing Group-By, Cross-Tab, and Sub-Tab. Data Model, Informatics and Applications, 7 (3), In ICDE, pp. 152-159 (1996) pp. 22-34 (2013) [8] Hindley, J. R., Seldin, J. P.: Introduction to Com- [23] Stupnikov, S. A, Vovchenko, A.: Combined Vir- binators and λ-Calculus. Cambridge University tual and Materialized Environment for Integration Press (1986) of Large Heterogeneous Data Collections. In Proc. [9] Kalinichenko, L. A.: Methods and Tools for of the RCDL 2014. CEUR Workshop Proceedings, Equivalent Data Model Mapping Construction. In 1297, pp. 339-348 (2014) EDBT, pp. 92-119, Springer (1990) [24] Stupnikov, S. A, Miloslavskaya, N. G., Budz- [10] Kalinichenko, L. A.: Integration of Heterogeneous ko, V.: Unification of Graph Data Models for Het- Semistructured Data Models in the Canonical One. erogeneous Security Information Resources' Inte- In RCDL, pp. 3-15 (1990) gration. In Proc. of the Int. Conf. on Open and Big [11] Kalinichenko, L. A., Stupnikov, S. A.: Construct- Data OBD 2015 (joint with 3rd Int. Conf. on Fu- ing of Mappings of Heterogeneous Information ture Internet of Things and Cloud, FiCloud 2015). Models into the Canonical Models of Integrated IEEE 2015, pp. 457-464 (2015) Information Systems. In Proc. of the 12th East- [25] Zakharov, V. N., Kalinichenko, L. A., Sokolov, I. A., European Conference, pp. 106-122 (2008) Stupnikov, S. A.: Development of Canonical Infor- [12] Kalinichenko, L., Stupnikov, S.: Synthesis of the mation Models for Integrated Information Systems. Canonical Models for Database Integration Pre- Informatics and Applications, 1 (2), pp. 15-38 (2007) serving Semantics of the Value Inventive Data [26] MongoDB. https://www.mongodb.org 213