<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On a Conceptual Data Model with Orientation to Data Integration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manuk G. Manukyan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Yerevan State University</institution>
          ,
          <addr-line>Yerevan 0025</addr-line>
          ,
          <country country="AM">Armenia</country>
        </aff>
      </contrib-group>
      <fpage>39</fpage>
      <lpage>53</lpage>
      <abstract>
        <p>In this paper a conceptual data model oriented to data integration is proposed. Formal definition of the considered conceptual data model is provided. To define the behavior of entities of the conceptual level, an algebra over such entities was developed. Formalization issues of data integration concept are discussed. Principles of mapping of source data models basic constructions into conceptual data model are considered. Mapping from data sources into conceptual schema is defined as an algebraic program.</p>
      </abstract>
      <kwd-group>
        <kwd>Data Integration</kwd>
        <kwd>Data Warehouse</kwd>
        <kwd>Mediator</kwd>
        <kwd>Data Cube</kwd>
        <kwd>Ontology</kwd>
        <kwd>Reasoning Rules</kwd>
        <kwd>OPENMath</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        For more than forty years, the problems of data integration have been the subject
of research in the field of databases. Analysis of existing approaches to data
integration can be found in the works [
        <xref ref-type="bibr" rid="ref10 ref14 ref18">10, 14, 18</xref>
        ]. Basically, these works are devoted
to the problems of integrating homogeneous data sources. Typically, an extended
relational or object data model was used as the target data model. Construction
of mapping from arbitrary source data models into a target data model assumes
using an extensible data model as the target one. In this connection, using the
XML data model as the target one is preferred as this model is some compromise
between conventional and semistructured data models because in contrast to:
– semistructured data model, the concept of database schema in the sense of
conventional data models is supported;
– conventional data model hard schemas, there is possibility to define more
lfexible database schemas.
      </p>
      <p>Since the end of the last century an approach to ontology-based data
integration has appeared. Such approach allows to provide ontology-based data
access. The current interest to data integration basically is connected with
challenges of big data processing. Analysis and management of big data assumes
the consideration of structured as well as semistructured and unstructured data
sources. In this context, it becomes important to combine conventional and
non-conventional approaches to data integration. For instance, machine learning
techniques can be applied to unstructured and semistructured data sources for
obtaining structured data from them.</p>
      <p>
        In the frame of this paper the issues of constructing an extensible
conceptual data model with orientation to data integration are considered. To model
conceptual entities an extensible formalism (OPENMath) is used [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Conceptual schema is defined as a collection of OPENMath objects. The property of
extensibility of the OPENMath allows to construct a mapping from arbitrary
data sources into conceptual schema. A distinguishing feature of the proposed
approach to data integration is also the possibility of creating and accessing
the integrated databases by means of conceptual level entities. Thus, it is
possible to define an integrated database in concepts of a subject domain using the
rich apparatus of computational mathematics, which is not unimportant when
processing big data.
      </p>
      <p>The paper is organized as follows: An extensible formalism to model
conceptual entities is considered briefly in Section 2. Formal definition of an extensible
conceptual data model is proposed in Section 3. Data integration concept
formalization is ofered in Section 4. Some issues of mapping from data sources
into extensible conceptual data model are discussed in Section 5. Related work
is presented in Section 6. The conclusion is provided in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Formal Bases</title>
      <p>This section provides a brief analysis of the OPENMath concept.
2.1</p>
      <sec id="sec-2-1">
        <title>OPENMath Objects</title>
        <p>
          OPENMath is an extensible formalism and is oriented to represent semantic
information on the mathematical objects. Formally, an OpenMath object is a
labeled tree whose leaves are basic OpenMath objects. Examples of basic
OPENMath objects are: Integer (integers in the mathematical sense, with no predefined
range), Symbol (a mathematical concept) and Variable (are meant to denote
parameters). The compound objects are denfied in terms of binding and application
of the λ-calculus [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The following recursive rules for constructing compound
OPENMath objects are proposed:
– Basic OPENMath objects are OPENMath objects.
– If A1, A2, ..., An (n ≥ 1) are OPENMath objects, then
        </p>
        <p>application(A1, A2, ..., An) is an OPENMath application object.
– If S1, S2, ..., Sn are OPENMath symbols, and A, A1, A2, ..., An (n ≥ 1) are
OPENMath objects, then attribution(A, S1 A1, S2 A2, ..., Sn An) is an
OPENMath attribution object and A is the object stripped of attributions.
– If B and C are OPENMath objects, and v1, v2, ..., vn (n ≥ 0) are
OPENMath variables or attributed variables, then binding(B, v1, v2, ..., vn, C) is
an OPENMath binding object.</p>
        <p>OpenMath objects have the expressive power to cover all areas of computational
mathematics.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Types in OPENMath</title>
        <p>A type system is built from basic types which are predefined as typed
OPENMath objects (for example, integer, string, boolean, etc.) and the following rules
by means of which typed OPENMath objects are constructed:
Attribution rule. If v is an OPENMath variable and t is a typed OPENMath
object, then attribution(v, type t) is typed OPENMath object. It denotes a
variable with type t. The following is an example of typed OPENMath application
object for trigonometric function sin v:
application(sin, attribution(v, type real))</p>
        <p>Application rule. If F and A are typed OPENMath objects, then application(F, A)
is typed OPENMath object.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Semantic Level</title>
        <p>OPENMath is implemented as an XML application. Its syntax is defined by
syntactical rules of XML, its grammar is partially defined by its own DTD
(Document Type Definition). Only syntactical validity of the OPENMath
object’s representation can be provided on the DTD level. To check semantics, in
addition to general rules inherited by XML applications, the considered
application defines new syntactical rules. This is achieved by means of introduction of
signature files concept (semantical constraints), in which these rules are defined.
Signature files contain the signatures of basic concepts defined in some content
dictionary and are used to check the semantic validity of their representations.
Content dictionaries are used to assign formal and informal semantics to all
symbols (concepts) used in the OPENMath objects. A content dictionary is a
collection of related symbols, encoded in XML format and fixing the ”meaning”
of concepts independently of the application.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>A Conceptual Data Model</title>
      <p>To construct a conceptual data model with orientation to data integration, we
are based on the concept of hierarchical relation as an entity of the
conceptual level. Below we introduce the definitions of hierarchical relation schema
and hierarchical relation. These definitions can be considered as strengthening
of definitions of the relation schema and relation of the relational databases.
Namely, unlike the relational databases, we allow use of semistructured data
model constructions during subject domain modeling.
3.1</p>
      <sec id="sec-3-1">
        <title>Formalization of Hierarchical Relations</title>
        <p>Definition 1 A hierarchical relation schema X is an attribution object and is
interpreted by a finite set of attribution objects {A1, A2, . . . , An}. Corresponding
to each attribution object Ai is a set Di (a finite, non-empty set), 1 ≤ i ≤ n,
called the domain of Ai.</p>
        <p>In the frame of the proposed extensible conceptual data model, the following
OPENMath representation of hierarchical relation schema X is accepted:
attribution(X, type A, S1 A1, S2 A2, . . . , Sk Ak), k ≥ 0
Here S1, S2, ..., Sk are OPENMath symbols, X is OPENMath variable, and
A, A1, A2, ..., Ak are OPENMath objects. In this representation, X is the name
of the atttribution object, A represents the type of the attribution object (basic
or composite), and by means of ⟨Si Ai⟩ pair defines one property of the modeled
object (1 ≤ i ≤ k). To construct composite types, we introduced the following
type constructors: sequence, choice and all. The conceptual entities that are
created using these type constructors have analogous semantics as the sequence,
choice and all elements of XML Schema language. The arguments of these
functions are typed attribution objects. We distinguish two types of typed attribution
object: basic and composite. A composite attribution object is defined by a type
constructor. The following is an example of a composite attribution object:
attribution(X, type application(sequence, A1, A2, . . . , An))
where Ai (1 ≤ i ≤ n) is a basic or composite attribution object. In case of a
basic attribution object the value of type symbol is a basic type (for example,
integer, string, etc.). Below an XML Schema element definition and its
equivalent representation in the frame of the conceptual data model (see Fig. 1.) is
considered:
&lt; xs : element name = “Book”&gt;
&lt; xs : complexT ype &gt;
&lt; xs : sequence &gt;
&lt; xs : element name = “Author” type = “xs : string”/ &gt;
&lt; xs : element name = “Title” type = “xs : string”/ &gt;
&lt; xs : element name = “Publisher” type = “xs : string”/ &gt;
&lt; /xs : sequence &gt;
&lt; /xs : complexT ype &gt;
&lt; /xs : element &gt;
≤
Definition 2 Let D = D1 ∪ D2 ∪ ... ∪ Dn. A hierarchical relation x on
hierarchical relation schema X is a finite set of mappings {t1, t2,..., tk} from X to
D with the restriction that for each mapping t ∈ x, t[Ai] must be in Di, 1 ≤ i
n. The mappings are called hierarchical tuples or simply tuples.
attribution
Book type</p>
        <p>application
sequence
attribution
attribution</p>
        <p>attribution</p>
        <p>Author type string Title type string Publisher type string
A hierarchical relation is an instance of a hierarchical relation schema. The
following are examples of instances of the above considered hierarchical relation
schema:
&lt;Book&gt;
&lt;Author&gt; David Maier &lt; /Author&gt;
&lt;Title&gt; The Theory of Reltional Databases &lt; /Title&gt;
&lt;Publisher&gt; Computer Science Press &lt; /Publisher&gt;
&lt; /Book&gt;
or
}
{
“Book”: {
“Author”: “David Maier”,
“Title”: “The Theory of Relational Databases”,
“Publisher”: “Computer Science Press”</p>
        <p>}
Definition 3 A key of a hierarchical relation x on hierarchical relation schema
X is a minimal subset K of X such that for any distinct tuples t1, t2 ∈ x,
t1[K] ̸= t2[K].</p>
        <p>We added the symbols X, x, tuple and key to conceptual data model to formalize
the concepts of hierarchical relation schema, hierarchical relation, tuple and key
correspondigly. A database schema S is a finite set of schemas of the hierarchical
relations. A database D on database schema S is a collection of the hierarchical
relation {x1, x2,..., xn} such that for each schema of the hierarchical relation
schema s ∈ S there is a hierarchical relation x ∈ D such that x is a hierarchical
relation with schema s that satisefis every constraint denfied in s. We introduce a
symbol d to conceptual data model to denote the set of all hierarchical relations
expressible in the frame of this model.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Algebra of Hierarchical Relations</title>
        <p>In fact, data model denfies o perations o ver d ata. I n o ur c ase, a u nit f or data
manipulation is a hierarchical relation (a set). Below the following operations
over hierarchical relations are proposed:</p>
        <p>To support the n-ary union of sets, we introduced the symbol union. This
symbol is used to denote associative/commutative union operation of sets:
To support the n-ary join of sets, we introduced the symbol join. This symbol
is used to denote associative/commutative join operation of sets:
The symbol minus is used to denote the diference o f sets:
To support a lfitering o peration, w e i ntroducedt he s ymbol σ . T his s ymbol is
used to denote a select operation on the set:</p>
        <p>σ : {x → {p : {tuple} → boolean}} → d
To support a projection operation, we introduced the symbol π . This symbol is
used to denote a unary operation on the set:
union : x × x → d
join : x × x → d
minus : x × x → d
Here name denotes the name of an attribution object and is denfied a s follows:
π : x[name∗ ] → d
name : {Attribution} → string
For integrating data, aggregating functions play a signicfiant role. We introduced
the min, max, count, sum and avg symbols to support the corresponding
aggregate functions of the relational algebra. Let f ∈ {avg, sum, count, max, min},
then</p>
        <p>f : x[name] → numericalvalue
Often, we need to consider the tuples of a hierarchical relation in groups. For
this purpose, we introduced a grouping symbol γ . This symbol is used to denote
a unary operation on the set:
γ : x[name∗ (, f : (tuple[name∗ ])∗ → numericalvalue)∗ ] → d
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Conceptual Schema</title>
        <p>The conceptual schema is an instance of conceptual data model and is intended
for formal knowledge representation in the form of a set of concepts of some
subject domain and relations between them. Such representations are used for
reasoning about entities of the subject domains, as well as for the domains
description. Conceptual schema is defined as a collection of typed attribution
objects. Our concept to data integration assumes constructing mapping from
arbitrary data sources into conceptual schema. Mapping generation from data
sources into conceptual schema assumes generating queries over data sources
and transforming these data into set of hierarchical relations. A distinguishing
feature of the conceptual level is its straticfiation of the local and global levels to
model the hierarchical relations. On the local level a homogeneous
representation of data sources is provided. The global level is intended to support various
data integration technologies, such as data warehouses, mediators and etc. The
entities of the global conceptual level are defined by algebraic programs. To
support entities of the conceptual level we use mechanisms of content dictionaries
and signature files of the OPENMath (for more details, see Appendix A and B).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data Integration Concept Formalization</title>
      <p>Our approach to data integration assumes formalizing the mediator, data
warehouse and data cube concepts in the frame of proposed conceptual data model.
Formalization results of these concepts are so-called reasoning rules of conceptual
level to support data integration concept. These rules are interpreted by means
of algebraic programs. Supporting the reasoning rules assumes developing new
content dictionaries to assign informal and formal semantics of data integration
basic concepts (for instance, hierarchical relation, hierarchical relation schema,
algebraic operations, etc.). Formalization of basic concepts is achieved by
applying the concept of OPENMath content dictionaries. Also we should formalize
signatures of these basic concepts for checking the semantic validity of its
representations. In this case, we will be using the OPENMath signature files concept
to formalize signatures of basic concepts.
4.1</p>
      <sec id="sec-4-1">
        <title>Mathematical Model</title>
        <p>The considered research object is the data integration concept. The proposed
formalization of this concept will be the mathematical basis for constructing the
reasoning rules.</p>
        <p>Mediator rule. Let sch and wrapper correspondingly denote the set of all
mediator schemas and the set of all subsets of the wrappers which are defined
on source data schemas to support the mediator concept, and let med denote
the set of all mediators, then
med ⊆ sch × wrapper</p>
        <p>Warehouse rule. Let wsch and extractor correspondingly denote the set of
all data warehouse schemas and the set of all subsets of the extractors which are
defined on source data schemas to support the data warehouse concept, and let
whse denote the set of all data warehouses, then
wshe ⊆</p>
        <p>wsch × extractor</p>
        <p>
          Cube rule. Our approach to create data warehouses is mainly oriented to
support data cubes. In typical OLAP applications, there is some collection of data
called the fact table which represents events or objects of interest [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Usually,
fact table contains several attributes representing dimensions, and one or more
dependent attributes that represent properties for the point as a whole. The
concept of the data cube assumes generation of the power set (set of all subset)
of the aggregation attributes. For some dimensions, there are many degrees of
granularity that could be chosen for a grouping on that dimension. When the
number of choices for grouping along each dimension grows, it becomes
nonefective to store the results of aggregating based on all the subsets of groupings.
Thus, it becomes reasonable to introduce materialized views. The materialized
view is interpreted by OPENMath application concept.
        </p>
        <p>Let ssch, sgty and view correspondingly denote the set of all fact table
schemas which are defined on source data schemas, the set of all granularities
and the set of all materialized views to support the data cube concept, and let
cube denote the set of all data cubes in this context, then</p>
        <p>cube ⊆ ssch × sgty × view</p>
        <p>Let source denote the set of all hierarchical relation schemas and let dir
denote the set of all data integration reasoning rules, then</p>
        <p>dir ⊆ source × (med ∪ whse ∪ cube)
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Reasoning Rules</title>
        <p>The conceptual data model should be extensible for providing reversible
mapping from arbitrary source data models into conceptual data model. One of
the reasons for choosing OPENMath as a formalism to support the concept of
data integration is its extensibility. The extension of the conceptual data model
assumes introducing new concept(s) in the frame of this model. Thus, the
extension of the conceptual data model leads to denfiing new symbols to support
data integration concept on the conceptual level. For applying a symbol on the
conceptual level, the following rule is proposed:
Concept ←</p>
        <p>symbol OPENMath object</p>
        <p>To support entities of local conceptual level, a source symbol to conceptual
data model is introduced. The following construction to define these entities is
proposed:
attribution(Local, source S1, source S2,..., source Sn)</p>
        <p>It is assumed that there are n source data (n ≥ 1). The value of source
symbol is a set of schemas of source data and each element of this set is defined
as an application object:
application(list, A1, A2, ..., Ak),
where Ai is a typed attribution object (1 ≤ i ≤ k).</p>
        <p>To support entities of global conceptual level the following reasoning rules
are considered.</p>
        <p>Warehouse rule. For supporting this rule a whse symbol to conceptual data
model is introduced. The following construction to define this rule is proposed:
attribution(Name, whse Algebraic Program)
The value of whse symbol is an algebraic program (an application object) by
means of which a mapping from data sources into data warehouse is defined.</p>
        <p>Mediator rule. The following construction to define this rule is proposed:
attribution(Name, med Algebraic Program)
The proposed construct to model mediator rule is interpreted analogously as in
the case of data warehouse.</p>
        <p>Cube rule. For formalizing the concept of materialized view we introduced
to conceptual data model the following symbols: cube, dim, granularity and
partition. A dimensional table schema is the value of a dim symbol and also
is interpreted by a typed attribution object. The materialized view is generated
by means of an algebraic program. To support hierarchical dimensions, we
introduced granularity and partition symbols. Hierachical dimension is defined
by an attribution object and is the value of a granularity symbol. The value
of partition symbol is division units of hierarchical dimension. The following
construction to define a fact table is considered:
attribution(Fact, type application(sequence, A1, A2,...,An), dim D1,
dim D2,..., dim Dm, granularity G1, granularity G2,..., granularity Gk),
where Aj is a basic attribution object (1 ≤ j ≤
n), Gi(1 ≤ i ≤ k)
is an attribution object and is defined as follows:
attribution(Name, partition application(list, P1, P2, ..., Pl))
The following is a granularity concept example:
attribution(Date, partition
application(list, Days, M onths, ..., Quarters, Y ears))
The following construction to define the cube rule is proposed:
attribution(Name, cube Algebraic Program)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>From Source Data Models to Conceptual Data Model</title>
      <p>This section discusses the rules for mapping basic concepts of structured,
semistructured, and also unstructured data models into conceptual data model. If
necessary, the conceptual model is expanded by adding new concepts (symbols) to
it.</p>
      <p>From aggregate models to conceptual data model. According to the aggregate
model, an aggregate is treated as data atomic unit and is presented as a collection
of related objects. In the frame of conceptual data model an aggregate is modeled
by means of the following construction:
attribution(AGName, type application(typeConstructor, A1, A2, ..., Am)),
where Ai (1 ≤ i ≤ m) is a basic or composite attribution object.</p>
      <p>From relational data model to conceptual data model. Let R = {A1, A2, ...,
An} is a relational schema. To model the considered schema the following
construction is used:
attribution(R, type application(sequence, A1, A2, ..., An)),
where Ai(1 ≤ i ≤ n) is a basic attribution object.</p>
      <p>From graph data model to conceptual data model. Graph data model allows to
present entities and relationships between these entities. Entities are also known
as nodes, and relations are known as edges. To model a node the following
construction is proposed:
attribution(NName, type application(set, P1, P2, ..., Pm),
edge application(set, E1, E2, ..., En)), where Pi( 1 ≤ i ≤ m) is a basic attribution
object (graph node property), and Ej (1 ≤ j ≤ n) is a composite attribution
object (graph node outgoing edge).</p>
      <p>It is assumed that node has m properties, from node is outgoing n type of
edges, and each type of edge is connected with kj nodes.</p>
      <p>From XML data model to conceptual data model. Basic concept of XML data
model which is used to model subject domain is XML-element. The following
constructions are used to model XML-element:
attribution(E, type basic type, attribute application(set, A1, A2, ..., Ak)),
or attribution(E, type application(typeConstructor, E1, E2, ..., Em),
attribute application(set, A1, A2, ..., Ak)),
where Ai (1 ≤ i ≤ k) is a basic attribution object (XML-attribute), and Ej (1 ≤
j ≤ m) is a basic or composite attribution object (XML-element). We introduce
a symbol attribute to model the concept of XML-attribute.</p>
      <p>From object data model to conceptual data model. Base concepts of object
data model are attributes, relationships and methods. The following construction
is used to model the attribute concept:
attribution(AName, type basic type), or
attribution(AName, type application(typeConstructor, collectionType)</p>
      <p>We are using the following type constructors set, bag, list, array, dictionary,
and structure to support concepts of the attribute and relationship. To model
the relationship concept the following constructions are used:
attribution(RName, relationship application(typeConstructor,
className) [inverse RName]), or
attribution(RName, relationship className [inverse RName])
To support the relationship concept, in addition to the considered type
constructors, we also introduced the relationship and inverse symbols, which have
obvious semantics.</p>
      <p>The following construction is proposed to model the method concept:
attribution(FName, arg application(list, A1, A2,...,An), return Type),
where the value of introduced symbol arg is a list of method arguments, Ai (1
≤ i ≤ n) is a basic or composite attribution object (method argument), and
the value of the introduced symbol return is type of the value returned by the
method.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        One of the first works in the area of justifiable data models mapping for
heterogeneous databases integration is [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In this paper, the concept of reversible
mapping of an arbitrary source data model into a target one (canonical model)
were proposed. In particular, the following [
        <xref ref-type="bibr" rid="ref13 ref15">13, 15</xref>
        ] papers are devoted to
problems of the heterogeneous database integration based on this concept.
      </p>
      <p>
        A significant contribution to the theory and practice of data integration was
made by the research group of M. Lenzerini (for instance, see [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1–5</xref>
        ]). Their
investigations were carried out in the frame of the traditional approach of the data
integration as well as in the frame of the paradigm of ontology-based data access
and integration. In these investigations only relational data as source data are
considered. To define ontology as well as mapping between ontology and data
sources, the description logic is used. Finally, in the frame of these
investigations, a formal approach to data quality is proposed. Namely, one of the most
important dimensions (consistency) of data quality is considered. The
following papers [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ] can be considered as some development of the works of M.
Lenzerini group, in which as a query language on the ontology level SPARQL
is considered. A SPARQL-program is translated into eficient (federated)
SQLprogram over data sources based on the proposed optimisation techniques.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] a system to automatically generate direct mappings between relational
databases and given target ontologies is developed. The considered system is
based on an intermediate internal graph representation that allows the
representation of both factual knowledge and heuristically observed patterns from the
input. An approach to ontology-based data integration is proposed in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which
also allows to generate a mapping from arbitrary data sources into ontology.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] an analysis to use machine learning techniques to solve diferent
problems to integrate unstructured and semistructured data is provided, namely, the
use of machine learning techniques for entity resolution, data fusion, data
extraction and schema alignment. Thus, the use of machine learning techniques
allows to automate parts of diferent integration problems.
      </p>
      <p>Finally, about our approach to data integration: In the frame of this
approach an extensible conceptual data model with orientation to data integration
is proposed. It is important that the proposed approach allows to generate a
mapping from arbitrary data sources into conceptual schema. Principles of
mapping of source data models (structured, semistructured and unstructured) basic
constructions into conceptual data model are considered. Mapping from data
sources into conceptual schema is defined as an algebraic program. A
computationally complete language is used to model the subject domain. Modelling
means of the conceptual level are insensitive to the extension of the conceptual
data model. In other words, expanding the conceptual data model is reduced to
introducing new concepts within this model using OPENMath formalism. This
property is the distinctive feature of the proposed approach to data integration.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In this paper the issues of creating and accesing the integrated databases by
means of conceptual level entities are investigated. Outcome of this research is a
conceptual data model with orientation to data integration. The proposed
conceptual data model is based on the OPENMath formalism and is an extensible
conceptual data model. To support the conceptual data model, new content
dictionaries in the frame of OPENMath concept was constructed. The property of
extensibility allows to integrate arbitrary data sources by using a
computationally complete language. Extension of the conceptual data model leads to defining
new concepts to model integrable data on the conceptual level. We introduced a
hierarchical relation concept as entities of conceptual level and formalized based
on the OPENMath concept. Conceptual schema is defined as a collection of
OPENMath objects. To define the behavior of objects of the conceptual level,
an algebra of hierarchical relations was developed. By means of algebraic
programs, so-called reasoning rules are interpreted. These rules are used to model
mediator, data warehouse and data cube concepts. The formal basis of these
rules is the proposed by us mathematical model of data integration concept.
Principles of mapping of the source data models basic constructions into
conceptual data model are considered. It is important that the proposed approach to
data integration allows to generate a mapping from data sources into conceptual
schema.</p>
    </sec>
    <sec id="sec-8">
      <title>Basic Concepts Formalization</title>
      <p>Data integration concepts such as hierarchical relation, hierarchical relation
schema, and algebraic operations are mathematical concepts. Thus, it is natural
to use the OPENMath content dictionaries to formalize these basic concepts.
The content dictionaries are used to define semantical information on the basic
concepts of data integration. A content dictionary which contains representation
of basic concepts of the subject domain contains two types of information: one
which is common to all content dictionaries, and one which is restricted to a
particular basic concept definition. Definition of a new basic concept includes the
name and description of the basic concept, and also some optional information
about this concept. To support basic concepts of data integration and the type
system of XML Schema, two content dictionaries have been developed. Below
an example of a basic concept definition is considered:
&lt;CDDefinition&gt;
&lt;Name&gt; X &lt; /Name&gt;
&lt;Description&gt;
To support the concept of hierarchical relation schema we introduce
the symbol X. Below we are using the Attribution symbol which has
been defined in OPENMath.
&lt; /Description&gt;
&lt;CMP&gt; X : Attribution∗ → {Attribution} &lt; /CMP&gt;
&lt; /CDDefinition&gt;
The above used XML elements have obvious interpretations. Only note, that the
element ”CMP” contains the commented mathematical property of the defined
algebraic concept. Content dictionaries contain just one part of the information
that can be associated with a basic concept in order to stepwise define its
meaning and its functionality. Specific information pertaining to the basic concepts
like the signatures is defined separately in the so-called signature files. In other
words, a signature file is used to formalize the basic concepts formats.
B</p>
    </sec>
    <sec id="sec-9">
      <title>Semantical Constraints</title>
      <p>
        As is mentioned above, to check the semantic validity of the basic concepts
representations, we associate extra information with content dictionaries in the
form of signature files. A signature file contains the definitions of all basic
concepts signatures of some content dictionary which is associated with this file. We
use Small Type System [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to formalize the basic concept signatures. Below the
definition of the signature of the above considered symbol X is provided:
&lt;Signature name = “X”&gt;
&lt;OMOB&gt;
&lt;OMA&gt;
&lt;OMS name = ”mapsto” cd = ”sts”/ &gt;
&lt;OMA&gt;
&lt;OMS name = ”nary” cd = ”sts”/ &gt;
&lt;OMS name = ”attribution” cd = ”sts”/ &gt;
&lt; /OMA&gt;
&lt;OMS name = ”attribution” cd = ”sts”/ &gt;
&lt; /OMA&gt;
&lt; /OMOB&gt;
&lt; /Signature&gt;
In the considered definition, the symbols mapsto and nary were defined in
OPENMath. The symbol mapsto represents the construction of a function type.
The first n-1 children denote the types of the arguments, the last denotes the
return type. The symbol nary constructs a child of mapsto which denotes an
arbitrary number of copies of the argument of nary. The operator is associative on
these arguments, which means that repeated uses may be flattened/unflattened.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable reasoning and eficient query answering in description logics: The DL− Lite family</article-title>
          .
          <source>JAR</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Ontologybased data access and integration</article-title>
          .
          <source>In Encyclopedia of Database Systems</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Query processing under GLAV mappings for relational and graph databases</article-title>
          .
          <source>In PVLDB</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Muro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruzzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Savo</surname>
          </string-name>
          .
          <article-title>The mastro system for ontology-based data access</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>43</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Console</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerin</surname>
          </string-name>
          .
          <article-title>Data quality in ontology-based data access: The case of consistency</article-title>
          .
          <source>In Twenty-Eighth AAAI Conference on Artificial Intelligence</source>
          , pages
          <fpage>1020</fpage>
          -
          <lpage>1026</lpage>
          .
          <source>AAAI</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Davenport</surname>
          </string-name>
          .
          <article-title>A small openmath type system</article-title>
          .
          <source>ACM SIGSAM Bulletin</source>
          ,
          <volume>34</volume>
          (
          <issue>2</issue>
          ):
          <fpage>16</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Dewar</surname>
          </string-name>
          .
          <article-title>Openmath: An overview</article-title>
          .
          <source>ACM SIGSAM Bulletin</source>
          ,
          <volume>34</volume>
          (
          <issue>2</issue>
          ):
          <fpage>2</fpage>
          -
          <lpage>5</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          .
          <article-title>Data integration and machine learning: A natural synergy</article-title>
          .
          <source>VLDB</source>
          ,
          <volume>11</volume>
          (
          <issue>12</issue>
          ):
          <fpage>2094</fpage>
          -
          <lpage>2097</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Database Systems: The Complete Book</article-title>
          . Prentice Hall, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>B.</given-names>
            <surname>Golshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , G. Mihaila, and
          <string-name>
            <given-names>W.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Data integration: After the teenage years</article-title>
          .
          <source>In PODS'17</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>106</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Hindley</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Seldin</surname>
          </string-name>
          . Introduction to Combinators and λ -Calculus. Cambridge University Pressl, Great Britain,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Kalinichenko</surname>
          </string-name>
          .
          <article-title>Data model transformation method based on axiomatic data model extension</article-title>
          .
          <source>In VLDB</source>
          , pages
          <fpage>549</fpage>
          -
          <lpage>555</lpage>
          . Springer,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Kalinichenko</surname>
          </string-name>
          .
          <article-title>Methods and tools for equivalent data model mapping construction</article-title>
          .
          <source>In Advances in Database Technology-EDBT'90</source>
          , pages
          <fpage>92</fpage>
          -
          <lpage>119</lpage>
          . Italy, Springer, March
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Kalinichenko</surname>
          </string-name>
          .
          <article-title>Efective support of databases with ontological dependencies: Relational languages instead of description logics</article-title>
          .
          <source>Programmirovanie</source>
          ,
          <volume>38</volume>
          (
          <issue>6</issue>
          ):
          <fpage>315</fpage>
          -
          <lpage>326</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Manukyan</surname>
          </string-name>
          .
          <article-title>On an approach to data integration: Concept, formal foundations and data model</article-title>
          .
          <source>In CEUR-WS</source>
          , volume
          <year>2022</year>
          , pages
          <fpage>206</fpage>
          -
          <lpage>213</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. G. Manukyan.</surname>
          </string-name>
          <article-title>An ontology approach to data integration</article-title>
          .
          <source>In CEUR-WS</source>
          , volume
          <volume>2790</volume>
          , pages
          <fpage>33</fpage>
          -
          <lpage>47</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>C. Pinkel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Binnig</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Schwarte</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Heupel</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Kraska</surname>
          </string-name>
          .
          <article-title>Incmap: A journey towards ontology-based integration</article-title>
          .
          <source>In BTW 2017</source>
          , pages
          <fpage>145</fpage>
          -
          <lpage>164</lpage>
          . Lecture Notes in Informatics,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. G. Xiao,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Ontology-based data access: A survey</article-title>
          .
          <source>In IJCAI-18</source>
          , pages
          <fpage>5511</fpage>
          -
          <lpage>5519</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. G. Xiao,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bilidas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rezk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Giese</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          .
          <article-title>Eficient ontology-based data integration with canonical IRIs</article-title>
          .
          <source>In ESWC 2018</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>713</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. G. Xiao,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Botoeva</surname>
          </string-name>
          .
          <article-title>Eficient handling of SPARQL OPTIONAL for OBDA</article-title>
          .
          <source>In ISWC 2018</source>
          , pages
          <fpage>354</fpage>
          -
          <lpage>373</lpage>
          . Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>